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INTRODUCTION 


THIS  volume  is  a  report  of  the  results  of  work  done  by  the 
Institute  of  Educational  Research  of  Teachers  College, 
Columbia  University,  to  provide  tests  for  use  in  the  vocational 
guidance  of  children  in  their  early  teens.  The  particular  prob- 
lem was  to  select  or  devise  tests  (i)  that  would  be  of  value  in 
predicting  fitness  for  various  careers,  (2)  that  could  be  given  a) 
to  children  in  fairly  large  groups,  b)  by  any  intelligent  teacher  or 
social  worker  who  would  give  a  reasonable  amount  of  time  to 
training  for  the  work,  and  c)  within  a  time  limit  of  three  hours; 
and  (3)  that  could  be  prepared  and  scored  cheaply. 

It  seemed  best  to  divide  the  three  hours  of  test  time  (two  hours 
of  actual  working  time  for  the  children)  somewhat  equally  among 
three  abilities:  (1)  the  ability  to  deal  with  ideas  and  symbols 
for  ideas;  (2)  the  ability  to  deal  with  things  and  mechanisms;  and 
(3)  the  ability  to  deal  with  clerical  items  and  procedures.  (We 
made  no  attempt  to  test  the  ability  to  deal  with  people.)  These 
three  abilities  correspond  roughly  to  three  of  the  trunk  lines  of 
vocational  activities  which  a  fifteen-year-old  may  enter.  He 
may  stay  in  school,  or  he  may  learn  a  trade,  or  he  may  do  office 
work.  A  fourth  main  line  is  selling.  Tests  to  predict  fitness  for 
selling  are  being  made  the  subject  of  extended  studies  by  the 
Carnegie  Institute  of  Technology. 

The  tests  finally  chosen  are  as  follows: 

The  I.E.R.  Arith.-Re.  Test  or  any  of 
"Ability  with  ideas"    \     the  standard  tests  of  general  intelli- 
gence. 

For  boys,  the  Stenquist  Assembly  Test. 
For  girls,  the  I.E.R.  Assembly  Test. 
"Ability  with  clerical     f  a)  High   level,   the   I.E.R.  General 
items  and  proce-        <         Clerical  Test,  C-i. 
dures"  [  b)  Lower  level,  the  I.E.R.  Test  C-2. 

These  can  be  given  to  a  group  of  one  hundred  children  within  the 
time  specified,  at  a  money  cost  of  about  thirty  dollars  for  mate- 
rials, a  time  cost  of  ten  hours  of  one  trained  person  and  twenty 
hours  of  each  of  four  slightly  trained  helpers. 
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We  cannot  as  yet  determine  how  great  value  the  child's  record 
in  these  tests  will  have  in  predicting  his  fitness  for  vocations, 
general  or  special,  or  in  guiding  him  in  the  choices  which  he  has 
to  make.  The  only  satisfactory  way  to  determine  their  value  is 
by  actual  trial  and  prolonged  study  of  the  correspondence  be- 
tween the  predictions  made  by  the  test  scores  and  the  actual 
future  life  histories  of  the  children  tested.  A  thousand  boys 
and  a  thousand  girls  of  the  graduating  elementary  school  grade 
are  now  being  tested,  and  will  be  followed  in  school  and  industry 
as  fully  and  as  far  as  possible. 

We  have,  however,  done  what  was  feasible  to  estimate  the  pre- 
dictive value  of  these  tests  in  advance,  first  by  investigations  of 
the  extent  to  which  the  separate  tests  do  measure  different  fea- 
tures of  human  nature,  and  second  by  checking  them  against 
criteria  of  vocational  success.  The  different  studies  made  for 
this  purpose  are  described  in  detail  in  the  body  of  the  report. 
The  gist  of  our  findings  is  as  follows: 

Our  test  for  "ability  with  ideas"  is  satisfactory.  It  does  cor- 
respond with  success  in  school  work  and  book-learning  in  gen- 
eral. A  boy  or  girl  who  scores  well  in  it  is  found  to  have  done  well 
and  to  be  doing  well  in  book-learning.  By  taking  more  time  and 
repeating  the  test,  the  correspondence  and  prophecy  can  be  made 
still  more  precise.  It  has  the  advantage  over  previous  tests  of 
ability  with  ideas  of  being  far  less  subject  to  special  practice  or 
unfair  coaching.  It  can  be  used  helpfully  along  with  any  of  these. 

The  boys'  test  for  "ability  with  things"  seems  as  satisfactory 
as  any  that  is  likely  to  be  devised  with  present  knowledge.  It 
measures  a  distinct,  and  probably  an  important,  feature  of  human 
ability.  The  score  obtained  in  it  corresponds  well  with  success 
in  shop  work  in  schools.  The  same  is  true  of  the  girls'  test  for 
"ability  with  things."  We  have  been  unable  to  check  either  test 
against  actual  industrial  success  "on  the  job"  or  to  ascertain 
how  free  they  are  from  disturbing  influences  from  special  practice 
or  coaching.  Probably  it  will  be  very  hard  to  make  any  tests  of 
mechanical  ability  that  are  not  susceptible  to  these  influences. 
We  are  confident,  however,  that  any  competent  psychologist  or 
employment  manager  or  vocational  counsellor  would  consider  the 
scores  attained  in  these  tests  a  valuable  part  of  a  personnel  record 
for  boys  and  girls  in  the  early  teens. 

The  two  tests  of  "ability  with  clerical  items"  have  been  the 
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subject  of  extended  study,  but  the  results  are  not  clear.  The 
higher-level  test  is  indicative  (when  its  elements  are  properly 
weighted)  of  ability  to  succeed  in  training  for  stenography,  type- 
writing, and  bookkeeping  in  business  schools.  It  is  somewhat, 
but  less,  indicative  of  success  among  actual  office  workers.  Its 
value  in  both  of  these  respects  may,  however,  be  in  considerable 
measure  due  to  its  relation  to  general  intelligence;  and  in  children 
from  thirteen  to  fifteen  it  seems  to  be  very  largely  a  test  of 
general  intelligence. 

The  lower-level  clerical  test  measures  powers  that  are  more 
distinct  from  general  intelligence.  It  indicates  success  among 
actual  office  workers  about  as  closely  as  the  higher  level  tests. 

It  is  impossible  to  decide  from  the  previous  work  by  Thurstone, 
Ruggles,  Thorndike  and  others,  how  far  the  mental  abilities  which 
function  in  clerical  work  are  simply  lesser  degrees  of  those  re- 
quired for  success  in  schools,  professions  and  thought-work  in 
general,  and  how  far  they  are  different  specialized  abilities. 
The  present  investigation  leaves  the  question  still  open.  An 
answer  should  come  from  the  future  careers  of  the  two  thousand 
children  being  tested  and  followed. 

At  all  events,  the  two  clerical  tests  are  worth  the  slight  time 
and  expense  required  to  give  them,  because  the  higher-level  test 
is  a  useful  check  on  the  test  for  ability  with  ideas,  and  because 
low  scores  in  both  mean  deficiency  in  ability  for  clerical  work 
by  any  reasonable  hypothesis.  One  of  the  greatest  services  of 
vocational  guidance  to  children  from  thirteen  to  sixteen  is  to 
direct  away  from  commercial  high  school,  business  colleges,  and 
office  work  those  who  have  little  or  no  chance  of  usefulness  and 
happiness  there.  It  will  be  very  much  safer  to  do  this  by  the  aid 
of  the  clerical  tests  than  on  the  basis  of  an  intelligence  test  alone. 

In  view  of  the  practical  conditions  under  which  vocational  guid- 
ance is  now,  and  probably  for  some  years  will  be,  given,  we  pre- 
pared tests  for  a  three-hour  period.  Our  studies  show  beyond 
question  that  a  much  longer  time  is  desirable.  We  can  measure 
a  fourteen-year-old's  physical  stature  in  a  few  seconds  and  with 
an  error  of  only  one  per  cent  of  the  difference  between  the  short- 
est and  the  tallest  of  his  age,  but  to  measure  his  stature  in  general 
intelligence  with  an  error  of  one  per  cent  of  the  difference  between 
the  dullest  and  the  brightest  fourteen-year-old  would  require 
at  least  three  hours  and  probably  more. 
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Still  more  time  seems  to  be  needed  for  a  precise  measure  of 
ability  with  things  and  mechanisms.  A  thirty-minute  test  is 
enormously  better  than  nothing,  but  it  is  also  enormously  worse 
than  an  accurate  measure.  A  wide  sampling  of  tasks  is  needed 
because  a  boy  or  girl  may  have  ability  in  one  sort  of  mechanical 
task  and  not  in  another. 

A  sampling  of  days  is  needed  because  a  child  has  his  ups  and 
downs.  Time  is  also  required  to  make  sure  that  he  understands 
what  he  is  to  try  to  do,  and  to  free  him  from  fear,  excitement, 
and  confusion.  It  is  sound  practice  to  give  in  every  case  at  least 
two  tests  for  any  ability,  preferably  on  two  different  days;  to 
this  end  we  have  chosen  or  devised  tests  which  can  be  extended 
by  alternative  forms  of  equal  difficulty  and  like  significance. 

Unfortunately,  limitations  of  time  and  funds  and  the  reluc- 
tance of  those  in  charge  of  schools,  factories,  and  offices  to  set 
aside  the  required  testing  time,  have  prevented  us  from  following 
this  practice  throughout  in  our  own  studies.  We  have  given  few 
retests,  though  we  have  given  them  wherever  it  was  feasible. 

In  connection  with  preparing  the  tests  and  finding  out  what 
may  be  expected  of  them  in  the  measurement  of  fitness  for  or 
the  prediction  of  success  in  various  vocations,  we  have  had  to 
consider  many  facts  and  problems,  including  technical  matters 
of  arrangement,  administration  and  scoring,  mathematical 
methods  in  multiple  correlation,  and  certain  general  principles  of 
mental  measurement  and  of  vocational  guidance.  Our  findings 
concerning  these  will  be  of  interest  to  all  students  who  seek 
mastery  in  this  field. 
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CHAPTER  I 


THE  PROBLEM 

The  scope  of  inquiry  of  the  research  has  been  limited  to  a  deter- 
mination of  the  value  of  certain  instruments,  particularly  tests, 
which  have  been  either  used  or  suggested  as  desirable  instruments 
upon  which  to  base  vocational  advice.  Additional  tests  have 
been  devised  to  meet  needs  discovered  during  the  investigation. 
In  the  investigation  covered  by  this  report  no  extensive  attempt 
has  been  made  to  give  vocational  advice  nor  to  determine  the 
efficiency  of  any  advice  given,  nor  even  primarily  to  devise  a 
practical  working  basis  for  giving  such  advice. 

A  study  of  the  literature  shows  that  investigators  in  the  past 
have  devoted  their  attention  almost  exclusively  to  two  types  of 
vocational  test  investigation. 

The  first  of  these,  probably  most  adequately  typified  by  the 
work  of  the  Army  Trade  Test  Division,  has  dealt  with  the  effi- 
ciency of  tests  in  measuring  acquired  vocational  proficiency  as  a 
basis  of  probable  fitness  for  an  immediate  position  in  industry. 
In  content  such  tests  have  not  all  been  primarily  trade  tests.  A 
great  many  other  varieties  of  tests  have  also  been  tried  out  by 
other  investigators.  One  of  the  most  notable  pieces  of  such  work 
is  the  extensive  researches  reported  by  Dr.  Link  in  his  book  on 
Employment  Psychology.  He  makes  use  of  psychological  tests 
specially  selected  by  means  of  their  correlations  with  demonstrated 
degrees  of  efficiency  when  administered  to  workers  of  known 
ability,  to  determine  what  tests  should  be  used  in  the  employ- 
ment office  to  select  applicants  who  presumably  would  develop 
worthwhile  industrial  capacity. 

The  second  trend  is  but  an  amplification  in  degree  of  the  work 
of  Link,  wherein  tests  selected  by  varying  means,  frequently  by 
random  selection,  have  been  tried  out  upon  groups  of  workers; 
the  efficiency  of  the  tests  in  predicting  the  known  different  de- 
grees of  skill  of  the  workers  being  then  reported.  Such  tests  vary 
in  content  from  those  which,  by  analogy,  resemble  the  occupa- 
tional activity,  to  tests  which  are  designed  to  measure  general 
intelligence,  reading,  or  other  strictly  educational  content. 
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A  common  feature  of  practically  all  such  tests  has  been  that 
they  were  administered  to  the  test  subjects  at  a  period  after  they 
had  already  gained  some  vocational  proficiency.  The  validity 
of  the  conclusion  that  such  tests  are  of  high  value,  or  otherwise, 
depends  upon  the  validity  of  the  assumption  that  similar  results 
would  have  been  obtained  had  the  tests  been  given  as  true  prog- 
nosis, or  as  hiring,  tests  at  the  time  of  entrance  of  the  subject 
to  the  occupation.  We  should  examine,  then,  the  following  more 
or  less  plausible  hypotheses  of  the  relationship  of  test  scores  to 
vocational  proficiency:  (i)  That  such  proficiency  as  is  exhibited 
on  tests  at  a  given  test  date  after  entrance  to  the  occupation  may 
be  almost  solely  tests  of  acquired  training,  such  a  condition  as 
would  insure  high  correlations  of  vocational  proficiency  with 
tests  which  measure  for  the  most  part  progress  rather  than 
native  capacity;  (2)  that  inasmuch  as  workers  of  varying  lengths 
of  experience  are  frequently  chosen  for  subjects,  the  correlations 
of  tests  with  efficiency  might  be  high  because  of  the  fact  that  the 
tests  measure  for  the  most  part  varying  amounts  of  experience  or 
practice  and  that  such  correlations  would  not  hold  were  all 
subjects  of  the  same  experience  level;  (3)  that  the  reverse  might 
be  true,  that  with  equal  amounts  of  experience  the  correlations 
of  tests  and  efficiency  might  be  higher  than  with  unequal  amounts; 
(4)  that  the  relative  scores  of  subjects  at  the  time  of  the  tests  are 
substantially  the  same  as  they  would  have  been  at  the  time  of 
entrance  to  the  occupation  and  that  consequently  the  correlations 
would  be  just  as  valid  at  entrance  as  at  the  time  of  the  test;  in 
other  words,  that  there  is  no  practice  effect;  (5)  that  the  tests 
measure  native  capacity  only  and  consequently  are  quite  valid 
for  predicting  potential  progress;  (6)  that  inasmuch  as  elimination 
of  a  certain  percentage  of  the  "unfit"  has  presumably  been  made 
before  the  time  of  administering  the  tests,  the  correlations  of  the 
" survivors"  are  smaller  than  they  would  have  been  if  computed 
upon  the  "applicants,"  and  consequently  the  published  validity 
coefficients  are  very  conservative  judgments  of  the  real  value  of 
the  tests. 

In  the  absence  of  conclusive  evidence  bearing  upon  the  above 
possibilities,  it  is  highly  desirable  to  test  a  large  group  of  subjects 
previous  to  their  entrance  to  industry,  under  actual  conditions 
of  guidance,  and  to  follow  up  the  careers  of  such  pupils  in  order 
to  determine  whether  the  tests  have  functioned  to  advantage. 
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This  work  will  be  carried  on  during  the  coming  year  with  the 
assistance  of  an  additional  grant  made  by  the  Commonwealth 
Fund. 

Vocational  and  educational  guidance,  to  be  most  effective, 
should  be  given  at  some  period  in  the  elementary  school.  There 
still  remains  the  question  of  whether  or  not  tests  given  during 
the  elementary  school  period  will  predict  capabilities  after  the 
pupils  have  grown  up  and  entered  industry,  i.e.,  at  what  age  does 
capacity  develop  to  the  point  where  its  future  course  can  be  suc- 
cessfully predicted?  It  might  possibly  be  the  case  that  tests 
would  predict  vocational  capacities  if  administered  after  the  sub- 
jects had  passed  the  age  at  which  intelligence  is  popularly  sup- 
posed to  cease  developing,  but  that  the  test  would  be  useless  if 
given  before  this  age  had  been  reached.  This  view  would  require 
that  specialized  aptitudes  be  developed  after  general  intelligence 
reaches  its  maximum.  Some  light  would  also  be  expected  to  be 
thrown  upon  the  answer  to  this  question  by  an  extended  follow- 
up  investigation  which  would  continue  over  a  period  of  years. 

It  has  been  the  specific  purpose  of  this  inquiry  to  investigate 
the  value,  as  a  means  of  predicting  school  and  vocational  ability 
of  people  already  occupied  in  those  vocations,  of  the  following  five 
scales:  The  Stenquist  Mechanical  Assembly  Test;  the  I.E.R. 
Girls'  Assembly  Test;  the  I.E.R.  General  Clerical,  or  High  Level 
Business  Test;  the  I.E.R.  Clerical  Test  C-2,  or  Low  Level  Busi- 
ness Test;  and  an  intelligence  scale  composed  of  arithmetic  and 
reading,  to  be  described  later.  The  use  of  these  tests  in  a  voca- 
tional guidance  program  should  obey  the  testing  principle  of 
securing  the  maximum  predictive  value  with  minimum  exertion. 
This  will  be  accomplished  most  easily  whenever  adequate  criteria 
of  future  progress  ability  in  a  vocation  have  been  secured,  by 
weighting  each  of  the  tests  above  named  (or  others  which  may 
be  used  in  a  vocational  guidance  program)  according  to  its 
independent  contribution  to  the  particular  vocational  criterion 
of  the  vocation  for  which  one  is  attempting  to  determine  a  given 
individual's  fitness.  If  it  should  be  true  that  high  fitness  for  a 
clerical  vocation,  other  things  being  equal,  means  low  fitness 
for  a  mechanical  one,  a  negative  weight  of  the  mechanical  test 
score  in  a  clerical  scale  would  be  the  statistical  outcome  of  the 
application  of  the  method  customarily  used  in  such  cases.  This 
would  yield  a  marked  differentiation  between  the  different 
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occupational  groups.  In  any  case,  each  test  could  be  weighted 
according  to  its  contribution  in  predicting  each  of  a  great  many 
different  criteria  of  occupational  success.  One  advantage  to  be 
gained  is  that  one  could  avail  himself  almost  without  additional 
effort  of  the  greater  predictive  value  of  additional  tests.  Two 
tests  are  always  better  than  one,  if  properly  weighted  with  respect 
to  each  other.  Save  by  the  merest  chance,  no  test  correlates  zero 
with  an  occupational  criterion,  the  usual  expectancy  being  that 
the  correlations  will  be  positive  with  almost  any  type  of  test 
administered. 

A  clerical  test  will  thus  predict  to  some  extent  ability  to  pro- 
gress in  a  rather  mechanical  vocation  and  vice  versa;  so  likewise 
will  a  reading  test  predict  to  some  extent  one's  ability  in  school 
work,  in  a  mechanical  vocation,  or  in  the  most  varied  or  the 
most  routine  clerical  work.  At  the  present  time  our  criteria  are 
ordinarily  quite  too  inaccurate  to  justify  more  than  tentative 
conclusions.  It  has  been  our  purpose  to  attempt  to  determine 
some  of  the  basic  relationships  involved  in  the  above  named  tests. 
Refined  statistical  techniques  have  been  used  where  desirable, 
and  crude  comparisons  where  economical ;  in  some  cases  we  have 
not  hesitated  at  hazarding  guesses  where  sufficient  evidence  is 
not  available. 

If  we  should  come  to  a  valid  conclusion  regarding  the  extent 
to  which  high  mechanical  capacity  means  likewise  high  clerical 
capacity  or  high  academic  school  capacity  and  vice  versa,  the  re- 
sults would  be  of  tremendous  value  for  vocational  guidance.  If 
we  could  settle  the  question  of  to  what  extent  one  must  have  a 
high  degree  of  general  intelligence  or  of  education  in  order  to 
enter  the  different  levels  of  clerical  work,  that  likewise  would 
be  an  important  conclusion.  The  account  of  the  various  re- 
searches undertaken  in  an  attempt  to  settle  some  of  these  ques- 
tions will  be  found  in  the  following  pages. 

It  is  assumed  that  vocational  guidance  is  a  function  of  the 
public  elementary  school  for  several  reasons:  (i)  A  pupil  does 
not  ordinarily  enter  a  vocation  until  after  his  compulsory  or  vol- 
untary education  is  completed;  (2)  guidance  should  be  given  in 
advance  of  the  immediate  situation,  such  as  school  elimination, 
which  plunges  the  boy  or  girl  into  industry;  (3)  the  facts  of  school 
elimination  show  that  if  guidance  is  not  given  in  the  elementary 
school,  a  majority  of  boys  and  girls  will  choose  a  vocation  with- 
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out  such  advice,  since  but  a  small  portion  enter  high  school; 
(4)  early  differentiation  of  school  work  in  the  case  of  some  boys 
and  girls  seems  desirable;  (5)  the  elementary  school  has  control 
at  some  time  over  all  boys  and  girls,  has  their  trust  and  confi- 
dence, can  and  should  supply  training  in  abilities  in  which  it 
may  be  noted  the  individual  is  not  as  well  developed  as  he  should 
be,  and  is  financially  disinterested;  (6)  the  elementary  school 
collects  more  valuable  data  on  the  abilities  of  growing  children 
than  any  other  agency. 
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CHAPTER  II 


TESTS  OF  ABILITY  WITH  IDEAS  AND  SYMBOLS 

Persons  giving  vocational  guidance  to  children  from  13  to  16 
years  of  age  will  as  a  rule  have  access  to  the  children's  school 
records.  Such  facts  as  the  grade  reached  at  a  certain  age  or 
within  a  certain  number  of  years  of  schooling,  the  standing 
attained  in  comparison  with  others  in  the  teacher's  estimation, 
and  the  scores  made  in  standard  tests  of  educational  achievement, 
— these  and  other  matters  of  school  record  are  predictive  of  later 
success  in  school  and  to  some  extent  in  vocations.  This  may  be 
illustrated  from  the  work  of  Kelley,1  Miles,2  and  Ross.3 

Kelley  1  had  as  the  object  of  his  investigation,  "the  utilization 
of  measures  obtainable  under  ordinary  classroom  conditions,  with 
whatever  errors  may  be  inevitable,  for  whatever  they  actually 
demonstrate  themselves  to  be  worth  as  evidence  of  the  capacity 
it  is  desired  to  measure."  He  concludes  (p.  84):  "It  will  be 
found  that  having  once  initiated  a  guidance  bureau,  the  demands 
upon  it  will  be  positive  and  innumerable — many  of  them  extrava- 
gant. In  the  attempt  to  meet  these  demands,  and  to  meet  them 
on  the  spot  and  without  a  moment's  delay,  one  of  the  richest 
sources  of  information  is  likely  to  be  only  very  partially  utilized. 
Reference  is  made  to  that  product  accumulated  by  every  pupil — 
school  grades." 

In  the  case  of  59  pupils  whose  marks  were  available  from  the 
fourth  grade  through  the  first  year  of  high  school,  he  finds  that 
the  average  grades  in  the  first  year  high  school  may  be  predicted 
from  a  regression  weighting  of  the  average  marks  in  grades  4,5,6, 
7,  8,  to  the  extent  of  r  =  .yg  =*=.03.  In  specific  subjects,  first-year 
high  school  English  marks  may  be  predicted  from  English  marks 
for  grades  4,  5,  6,  7,  8,  to  the  extent  of  r  =  .71  ±.04,  while  mathe- 
matics similarly  measured  yields  r  =  .58  =»=  .06.    Kelley  combines 

iKelley.T.L.  Educational  Guidance.  116  pp.  Teachers  College  Contributions 
to  Education,  No.  71,  1014. 

2  Miles,  W.  R.  Comparison  of  Elementary  and  High  School  Grades.  Univ.  of  Iowa 
Studies  in  Education,  Vol.  I,  No.  1. 

1  Ross,  C.  C.  "The  Diagnostic  Value  of  Individual  Record  Cards."  To  appear 
shortly  in  the  Journal  of  Administration  and  Supervision. 
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elementary  school  marks,  teachers'  estimates,  and  special  tests  of 
33  pupils  and  predicts  first  year  high  school  standing  by  the 
regression  equation  with  an  efficiency,  r  —  . 89=*=. 02.  He  says 
(p.  70):  "This  very  high  correlation  is  of  interest  in  showing  the 
stability  of  individual  character." 

He  further  finds  (p.  13) :  "  Indeed  it  seems  that  an  estimate  of  a 
pupil's  ability  to  carry  high  school  work  when  the  pupil  is  in  the 
fourth  grade  may  be  nearly  as  accurate  as  a  judgment  given  when 
the  pupil  is  in  the  seventh  grade,  for  the  correlation  in  the  former 
case  is  .62  and  in  the  latter  only  .10  higher." 

Miles  1  finds  that  the  correlation  between  average  elementary 
school  grade  and  the  average  high  school  grade  is  .71.  This  is 
quite  in  harmony  with  the  results  of  the  Kelley  study  and  it  is 
probable  that  the  Miles  data,  treated  by  the  regression  equation 
method,  would  yield  correlations  between  .80  and  .90. 

C.  C.  Ross,2  in  a  preliminary  investigation  of  the  traits  pos- 
sessed by  46  high  school  graduates,  has  found  that  in  so  extremely 
selected  a  group  as  the  graduating  class  of  the  senior  high  school, 
average  marks  in  high  school  work  of  such  pupils  can  be  predicted 
to  the  extent  of  a  correlation  of  .74 ±.05  by  means  of  a  proper 
weighting  of  the  school  marks  and  absences,  which  had  accumu- 
lated in  the  principal's  office,  up  to  the  time  of  graduation  from 
the  eighth  grade.  The  factors  of  most  importance  in  this  case  are 
the  school  marks  made  in  the  school  subjects  throughout  the 
grade  school  career. 

The  eighth  grade  is  very  much  more  limited  in  range  of  ability 
than  is  an  age  group;  this  range  of  ability  is  presumably  greatly 
decreased  by  the  fact  that  only  a  portion  of  the  eighth  grade 
graduates  enter  high  school;  the  range  of  ability  of  the  entering 
high  school  students  is  further  delimited  by  reason  of  the  large 
elimination  which  takes  place  between  the  entrance  to  high  school 
and  graduation.  If  one  could  adequately  allow  for  the  change  in 
range  of  ability  involved  in  all  of  the  above,  the  correlation 
coefficient  of  .74  would  undoubtedly  be  considerably  increased. 

An  intelligence  test  should  be  used  to  supplement,  not  to 
replace,  an  educational  history.  The  two  together  will  make  a 
better  prediction  than  either  alone. 

Any  efficient  test  of  ability  with  ideas  will  serve  as  such  an 


1  Miles,  op.  cit. 

2  Ross,  op.  cit. 
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intelligence  test.  The  value  of  its  contribution  in  the  prediction 
of  success  in  school  work  has  been  made  clear  by  many  workers. 
As  an  illustration,  we  may  take  the  work  of  Mann1  and  Thorn- 
dike. 

They  studied  the  qualifications  of  engineering  students  in  the 
case  of  34  Columbia  University  freshmen  in  engineering.1  The 
correlation  between  a  weighted  composite  of  seven  tests  and  a 
composite  criterion  of  intelligence  was  .87  ±.03;  and  for  fifteen 
weighted  tests,  .97=*=  .01.  This  unusually  high  result  was  due  to 
an  extremely  large  range  of  ability,  such  as  is  seldom  found.  The 
seven  tests  consumed  five  hours'  time  in  administration.  Five 
of  the  seven  selected  tests  were  mathematics  tests,  and  the 
remaining  two  were  sentence  completion  tests.  The  same  fifteen 
tests  when  given  to  41  engineering  freshmen  at  the  University  of 
Cincinnati  correlated  with  "academic  achievement"  to  the  extent 
of  .64  ±  .06.  The  first  year  college  ratings  correlate  .62  =*=  .07 
with  the  second  year  college  ratings;  in  other  words,  the  tests 
predict  the  first  year  marks  equally  as  well  as  the  first  year  marks 
predict  the  second  year  marks.  When  given  at  the  Massachusetts 
Institute  of  Technology  to  40  freshmen  in  engineering,  the  cor- 
relation with  academic  rating  was  .49=*=. 08,  while  the  first  year 
marks  correlated  with  the  second  year  marks  to  the  extent  of 
.64  =±=  .06.    These  are  all  engineering  groups. 

Entrance  to  the  professions  in  general  requires  now  a  high 
school  graduation  as  a  prerequisite,  and  we  have  made  an  elab- 
orate study,  reported  elsewhere,2  of  the  significance  of  a  person's 
score  in  a  standard  test  of  ability  with  ideas  as  an  indication  of 
how  likely  he  is  to  progress  to  high  school  graduation.  The 
following  summary  of  Miss  Cobb's  findings  will  indicate  the 
significance  of  the  Army  alpha  scores  in  this  connection. 

The  high  school  population  in  this  country  is  limited  to  approxi- 
mately the  upper  half  of  the  whole  range  of  American  intelligence, 
as  measured  by  Alpha.  This  means  an  Alpha  score  of  65  up. 
Children  who,  at  14  years  of  age,  cannot  score  more  than  65  are 
not  likely  even  to  enter  high  school.  For  success  in  a  non- 
academic  course  in  which  the  subjects  are  for  the  most  part 


1  Mann,  C.  R.  A  Study  of  Engineering  Education.  Carnegie  Foundation. 
Bulletin  No.  II,  1918. 

2  Cobb,  M.  V.  "  The  Limits  Set  to  Educational  Achievement  by  Limited  Intelli- 
gence." Journal  of  Educational  Psychology,  Vol.  13,  Nos.  8  and  9  (Nov.  and  Dec, 
1022). 
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definitely  less  difficult  than  algebra,  the  ability  to  score  at  least 
85  to  100  is  usually  necessary.  Our  Michigan  data  give  evidence 
that,  of  the  freshmen  who  score  77  or  less  on  Army  Alpha,  about 
87  per  cent  drop  out  before  the  senior  year.  Madsen's  data 
confirm  this;  about  84  per  cent  of  the  freshmen  who  score  77  or 
less  are  eliminated.  This  means  that  only  about  one  in  seven 
remains  to  graduate,  unless  the  alpha  score  is  better  than  77. 
Similarly,  the  Army  data  show  that  when  the  recruits  were  in 
school,  10  years  or  more  ago,  78  per  cent  of  those  scoring  below  85 
in  later  life  had  not  remained  in  high  school  to  graduate.  In  the 
group  of  commissioned  officers  in  which  other  character  traits 
were  on  the  whole  very  high,  the  elimination  among  those  scoring 
below  85  was  only  24  per  cent.  This,  however,  is  more  than 
twice  the  proportion  dropped  from  this  officer  group,  as  a  whole, 
i.e.,  those  scoring  above,  as  well  as  below,  85;  only  10  per  cent  of 
the  entire  group  failed  to  graduate. 

For  a  strictly  academic  course,  including  algebra,  success,  i.e., 
profit  to  the  child,  is  doubtful  for  a  child  who  at  14  cannot  score 
100  to  no  or  better  on  the  Army  test.  This  means  a  mental  age 
of  15-6  to  16-2  and  (at  14  years)  an  I.Q.  of  no  to  115.  Proctor 
mentions  67  as  a  general  minimum,  but  this  seems  to  us  rather 
low;  it  means  an  intelligence  quotient  of  95.  Probably  nine 
times  out  of  ten  it  is  unwise  to  guide  the  average,  or  less  intelligent 
than  average,  child  into  the  present  academic  high  school. 
Unless  his  I.Q.  is  over  100,  or  his  mental  age  over  14,  he  should  be 
encouraged  to  try  some  other  type  of  training. 

The  value  of  a  score  in  ability  to  deal  with  ideas  as  a  means  of 
predicting  fitness  for  the  actual  work  of  a  vocation  will,  of  course, 
depend  upon  what  the  vocation  is.  Doubtless  the  correlation  is 
in  general  positive,  and  in  some  cases  high.  Knight1  found 
among  a  group  of  superior  high  school  teachers,  a  "corrected  for 
attenuation"  correlation  of  .57  between  score  in  such  a  sixty- 
minute  test  and  reputed  success  as  teachers,  and  we  may  there- 
fore estimate  conservatively  that  if  ten  thousand  fourteen-year- 
olds  were  taken  at  random,  tested  with  such  a  test,  trained  to  be 
high  school  teachers  and  tried  at  the  job,  the  correlation  between 
the  test  score  and  their  future  success  as  high  school  teachers 
would  be  well  over  .90. 

On  the  other  hand,  there  are  industrial  careers  where  ability 
to  deal  with  ideas  is  probably  a  very  minor  factor  in  success. 

The  relationship  between  general  intelligence  test  scores  and 


1  Knight,  F.  B.  Qualities  Related  to  Success  in  Teaching.  67  pp.  Teachers 
College  Contributions  to  Education,  No.  120,  New  York,  1922. 
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specific  occupational  aptitudes  may  be  low.  This  is  illustrated 
by  two  studies  reported  in  the  Memoirs  of  the  National  Academy  of 
Science,  Vol.  XV.1  The  Pearson  association  coefficient,  C,  be- 
tween letter  ratings  secured  on  army  trade  tests  and  letter  ratings 
secured  on  Army  Alpha,  is  shown  for  four  groups  below.  Of 
these  four  groups,  the  truck  drivers  were  tested  on  a  performance 
test,  while  the  other  three  groups  were  tested  on  oral  trade  tests. 

Number 

Occupation  of  Cases  Pearson  C 

Auto  repairman    666   265 

Machinist   451   254 

Horse  hostler    296   277 

Heavy  truck  driver   644   130 

The  above  results  are  complicated  by  varying  amounts  of  experi- 
ence. The  conclusions  derived  from  the  detailed  tables  from 
which  the  above  was  constructed  are : 

It  would  seem  that  the  function  of  intelligence  plays  a  varying 
part  in  its  relation  to  degrees  of  skill  as  classified  by  trade  tests. 
For  example  in  the  case  of  general  auto  repairmen,  we  seem  to 
find  a  difference  (in  Alpha  scores)  only  between  the  apprenticeship 
level  and  the  higher  levels  of  skill.  In  the  case  of  machinists  we 
find  the  difference  only  between  the  expert  level  and  the  two  lower 
levels.  In  the  case  of  truck  drivers  we  find  no  significant  differ- 
ences between  levels  of  skill,  although  intelligence  is  demonstrably 
a  factor  in  qualifying  in  the  apprenticeship  level  of  that  trade. 

Of  even  more  interest  is  a  result  of  a  study  of  a  group  of  grapho- 
type  operators,  for  whom,  evidently,  a  very  objective  criterion 
of  efficiency  was  available.  "In  a  group  of  106  graphotype 
operators  of  the  Treasury  Department,  the  median  Alpha  score 
is  75  with  extremes  of  11  and  174.  The  median  of  the  average 
daily  output  of  plates  by  this  group  is  245  with  a  median  error  of 
2.9  per  cent.  The  highest  individual  average  is  391  plates  per 
day,  the  lowest,  113.  The  correlation  between  output  and 
accuracy  is  .1 13  =*=  .06,  and  between  Alpha  and  accuracy,  .019  =*=  .06, 
and  between  Alpha  and  output,  —  .087  =*=  .06.  The  returns  are  of 
special  interest  in  that  they  exhibit  such  low  correlations  between 
intelligence  and  accuracy  and  speed  in  mechanical  work." 

Otis2  has  similarly  found  "zero"  correlation  between  "produc- 


1  pp.  835-37. 

2  Otis,  A.  S.  "The  Selection  of  Mill  Workers  by  Mental  Tests."  Journal  of 
Applied  Psychology,  Vol.  4,  No.  4,  pp.  339-41  (1920). 
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tive  ability  ascertained  by  careful  investigation"  and  intelligence 
in  a  group  of  400  employees  of  a  silk  mill,  and  comes  to  the  conclu- 
sion that  "intelligence  is  not  only  not  required  in  a  modern  silk 
mill  for  most  operations,  but  may  even  be  a  detriment  to  steady, 
efficient,  routine  work.  What  qualities  are  required,  remain  to  be 
sought.  Whether  they  are  measurable  is  doubtful.  They  may 
be  stolidity,  inertia  of  attention,  regularity  of  habits,  etc."  A 
number  of  factors  serve  to  attenuate  his  results,  yet  the  essential 
conclusion  of  his  research  clearly  points  out  the  small  role  which 
general  intelligence  may  have  in  factory  work. 

On  the  other  hand,  Link1  has  found  very  significant  relation- 
ships between  specialized  tests  and  operations  which  are  no  more 
complex  than  those  involved  in  silk  mill  operations. 

E.  M.  Martin,  in  an  unreported  investigation  of  the  talents  of 
30  policemen,  found  a  correlation  of  .80  =*=  .04  between  a)  a  criterion 
composed  of  a  composite  of  one  ranking  and  one  rating  of  patrol- 
men by  each  of  four  commanding  officers,  and  b)  a  statistically 
selected  weighted  composite  of  intelligence  and  educational  tests 
and  physical  and  social  traits. 

The  I.E.R.  Arith.-Re.  Test 

Any  one  of  the  better  tests  of  so-called  general  intelligence  will 
serve  the  purpose  of  vocational  guidance  of  children  from  13  to  16. 
We  suggest  the  use  of  a  combination  of  the  Thorndike-McCall 
Reading  Test  and  the  test  in  Arithmetical  Problem-Solving. 
This  combination  of  tests  is  referred  to  in  this  study  as  the  I.E.R. 
Arith.-Re.  Test.  This  makes  an  intelligence  test  which  is  easy  to 
give  and  easy  to  score,  which  all  the  children  understand,  which  is 
remarkably  free  from  harmful  effects  of  special  practice,  and 
which  is,  or  may  be  made,  uncoachable,  since  there  already 
exist  ten  alternative  forms  of  it,  and  hundreds  more  can  be 
made. 

The  value  of  many  of  the  current  intelligence  scales  is  impaired 
by  reason  of  their  high  susceptibility  to  improvement  through 
practice.  Ratings  from  such  a  scale  will  generally  become  higher 
and  higher  as  successive  forms  of  the  test  are  taken  by  the  test 
subject.  This  variation  in  ratings  due  to  practice  improvement 
does  not  hold  in  the  case  of  arithmetic  and  reading,  ratings  from 


1  Link,  H.    Employment  Psychology.    440  pp.    Macmillan  Company,  1919. 
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which  are  more  nearly  constant,  due  to  the  fact  that  at  any  one 
particular  time  both  reading  and  arithmetic  have  been  almost 
maximally  practiced.  If  a  weighted  combination  of  a  good  arith- 
metic and  a  good  reading  test  would  correlate  as  highly  on  the 
average  with  the  present  intelligence  scales  as  the  average  corre- 
lations of  such  intelligence  scales  with  all  others,  then  we  would  be 
justified  in  assuming  that  for  practical  purposes  such  a  test  is  as 
valid  an  intelligence  measure  for  the  general  school  population  as 
the  current  intelligence  tests.  This  makes  the  assumption,  which 
has  abundant  substantiation  in  fact,  that  such  a  combination  will 
correlate  as  highly  with  ability  to  progress  in  school  as  will  the 
intelligence  tests.  Whether  such  a  combination  correlates  with 
the  current  intelligence  scales  is  an  aspect  of  reliability;  whether 
it  correlates  with  an  adequate  measure  of  school  progress  ability 
is  the  corresponding  aspect  of  validity.  Naturally  one-half  of 
total  school  work,  if  adequately  measured,  correlates  more  highly 
with  the  other  half  of  school  work  itself  than  will  any  possible 
practical  combination  of  opposites  tests,  completion  tests,  and  the 
like.  A  measure  of  at  least  two  school  abilities  is  desirable  for 
giving  educational  guidance.  A  knowledge  of  one's  reading  and 
arithmetic  abilities  is  valuable  for  the  specialized  abilities  them- 
selves; if  we  can  secure  an  intelligence  rating  from  them  gratis, 
that  is  just  so  much  additional  information.  Such  tests  will  be 
unfair  to  only  those  who  have  language  difficulty.  The  work  of 
Kelley 1  indicates  that  there  is  as  much  reason  to  believe  that  one's 
educational  status  remains  approximately  constant  throughout 
his  school  career  as  that  his  intellectual  brightness  remains  con- 
stant; this  would  be  apparent  if  we  could  rid  our  minds  of  the 
arbitrary  nature  of  the  units  at  present  employed  in  measuring 
brightness. 

The  Weighting  of  Arithmetic  and  Reading  for  Use  as  an  Intelli- 
gence Score. — In  the  absence  of  any  criterion  of  general  ability  to 
progress  in  school  work  it  was  decided  to  weight  arithmetic  and 
reading  equally.  A  number  of  investigations  have  seemed  to 
indicate  that  in  all  probability  reading  should  be  weighted  the 
higher  of  the  two,  deficiencies  in  reading  being  of  greater  signifi- 
cance for  determining  the  rate  of  progress  through  school.  How- 
ever, an  undue  weighting  of  reading  will  make  the  test  less 


1  Kelley,  T.  L.    Educational  Guidance. 
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reliable  in  the  individual  cases  of  pupils  who  have  language 
difficulty.  This  fact  points  to  a  possible  desirability  of  not  giving 
reading  as  much  weight  as  might  otherwise  be  accorded  to  it. 
It  was,  however,  decided  to  weight  arithmetic  and  reading  equally. 
Since  there  are  available  a  number  of  alternative  forms  of  the 
Thorndike-McCall  Reading  Scale,  each  of  which  varies  slightly 
from  the  others  in  difficulty,  but  which  variation  is  theoretically 
adequately  taken  care  of  by  using  the  T-score  rather  than  the 
gross  score  made  by  each  individual  child,  the  T-scores  were 
used  as  the  basis  for  determining  the  variability  of  the  Thorndike- 
McCall  Reading  Test.  Such  T-scores  were  not  available  for  the 
Thorndike  Arithmetical  Problem-Solving  Test  so  raw  scores  were 
used  in  determining  its  variability.  For  the  467  pupils  in  Public 
School  B  the  standard  deviation  of  reading  T-scores  was  10.52; 
the  corresponding  standard  deviation  for  the  Thorndike  arith- 
metic raw  scores  was  340.  If,  then,  we  weight  the  raw  scores  of 
arithmetic  3  and  the  reading  T-scores  1,  arithmetic  will  be  given 
a  partial  correlation  importance  of  97/100  of  reading,  giving 
reading  a  slightly  greater  weight.  This  enables  the  weighting  to 
be  very  readily  done  since  the  multiplication  of  the  raw  scores  of 
the  arithmetic  can  be  done  mentally  and  no  computation  is 
required  on  the  reading  T-scores.  The  entire  school  involving 
all  pupils  in  6A  (first  semester  of  the  sixth  grade)  and  above,  and, 
in  addition,  all  boys  13  years  of  age  and  over  in  the  entire  school, 
was  used  to  determine  the  variability  for  the  ages  to  which  one  is 
likely  to  be  giving  vocational  advice,  namely  13,  14,  and  15  years. 
The  distribution  of  reading  T-scores  and  arithmetic  raw  scores 
shows  that  the  numbers  of  pupils  who  fail  to  grasp  the  directions 
of  the  tests  is  insignificant  and  also  that  there  is  a  good  spread  of 
scores  in  these  ages.  The  results  of  the  intercorrelations  of  this 
composite  variable,  hereinafter  called  the  "  Arith.-Re."  Test,  with 
other  well  known  measures  of  intelligence  show  that  it  correlates 
as  well  with  such  measures  as  do  the  general  run  of  such  measures 
correlate  with  each  other.  The  intercorrelations  of  reading  and 
arithmetic  with  half-year  gains  in  school  work  and  with  C-i, 
General  Clerical  Test  show  that  reading  correlates  slightly  better 
than  arithmetic  with  half-year  gains,  and  that  reading  likewise 
correlates  better  with  C-i.  This  holds  true  for  both  boys  and 
girls  of  separate  ages  12-15  inclusive,  with  but  one  or  two  ex- 
ceptions.   Absence  from  school  does  not  affect  this  intelligence 
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TABLE  I 

The  Intercorrelations  by  Age  Groups  of  Intelligence  Measures 
Administered  to  Girls,  in  Public  Schools  G  and  J,  on  Whom  the 

Records  Were  Complete* 


Half- 
Year 
Gains 

Arith.- 
Re. 

C-i 

C-2 

Haggerty 
Delta 
2 

N.I.T. 

Age 

N 

r 

W  LU. 

Av. 

»T 

r 

Wtd. 
Av. 
r 

r 

w  tu. 
Av. 

r 

Wtd. 
Av. 
r 

r 

Wtd. 
Av. 
r 

r 

Wtd. 
Av. 
r 

-  .05 

•  38 

.17 

.41 

■  3i 



31 

•45 

.60 

.40 

•  53 

■45 

1 2 

A  1 

Half- Year 

•36  .31 

33 

.41 

■3i  .25 

•  27 

■  37 

•  33 

13 

Gains 

•  5i 

.19 

.15 

•  43 

.21 

14 

30 

.22 

•  70 

.17 

•  55 

30 

15 

14 

-os 

•47 

.10 

.48 

•  54 

1 1 

•45 

•  53 

.40 

.69  .. 

.62 

1 2 

Arith.-Re. 

•  36 

.81 

.64 

•70  .45 

.81 

•  79 

.69 

13 

•  51 

•75 

•  57 

.85 

•  71 

14 

.  22 

.64 

•  55 

.80 

.88 

15 

•  38 

•47 

•  47 

•47 

■  54 

1 1 

.60 

•  53 

.66      .  . 

•  58      .  . 

.62 

12 

C-i 

•  33 

.81 

.81  .66 

.76  .67 

.80 

•  70 

13 

.19 

•  75 

.65  .. 

.83 

.80 

14 

•  70 

.64  .. 

•  72 

.76  .. 

•  76 

15 

•  17 

.10 

■47 

.20 

•  13 

11 

.40 

.40 

.66 

.40 

.46 

12 

C-2 

31 

.70  .45 

.81 

.66 

.64  .50 

■  76 

•  50 

13 

•  15 

•  57 

.65 

•  7i 

56 

U 

•  17 

•  55 

•  72 

.62      .  . 

•  72 

15 

.41 

.48  .. 

■47 

.20 

•57 

11 

•  53 

.69  .. 

■58 

.40 

.80 

12 

Haggerty 

.27 

.81 

■  76 

.67 

.64  .50 

.82 

•  76 

13 

Delta  2 

•43 

.85 

.83 

.71 

•79 

14 

•  55 

.80 

■  76 

.62 

.87 

15 

•  31 

•  54 

•54 

.13 

•  57 

11 

•45 

.62 

.62 

.46  .. 

.80 

12 

N.I.T. 

•  37 

•79  -69 

.80 

•  70 

•76  .50 

.82 

13 

.  21 

•  71 

.80 

.56  .. 

•  79 

14 

30 

.88 

•  76 

.72 

.87 

15 

Av.  of  Wtd. 

Av.  r's 

•  35 

.56 

.62 

•47 

.62 

.60 

t  Weighted  average  r  found  by  weighting  all  ages  1,  save  age  15,  which  is  weighted 
*  The  Probable  Error  of  Correlation  Coefficients  in  this  table  may  be  determined  from  the 
following  table: 
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P.E.r  When  r  Is  of  the  Value 


N 

0 

± .  1 

±  .2 

±•3 

±•4 

±.6 

*-7 

±.8 

±•9 

14 

.18 

.18 

.17 

.16 

•  15 

.14 

.  12 

.09 

.06 

.03 

30 

.  12 

.  12 

.  12 

.11 

.  10 

.09 

.08 

.06 

.04 

.02 

31 

.  12 

.  12 

.12 

.11 

.  10 

.09 

.08 

.06 

.04 

.02 

42 

.  10 

.  10 

.  10 

.09 

•  09 

.08 

.07 

•  OS 

.04 

.02 

47 

.  10 

.  10 

•  09 

.09 

.08 

.07 

.06 

•  05 

.04 

.02 

measure  quite  as  much  as  it  affects  a  pupil's  half-year  gains  as 
shown  by  the  correlation  coefficients  in  Table  V,  page  22. 

The  Comparison  of  the  Arith.-Re.  Test  with  Standard  Intelligence 
Tests. — At  Public  School  G  a  number  of  intelligence  and  educa- 
tional tests  had  been  given  in  the  fall  of  192 1  by  Dr.  J.  L.  Sten- 
quist,  of  the  Bureau  of  Reference  and  Research  of  the  New  York 
City  Schools.  These  scores  were  available  for  comparison  with 
the  corresponding  intelligence  scores  determined  in  the  present 
investigation.  The  records  of  164  people,  divided  among  five 
ages,  were  found  complete  for  the  following  test  variables: 

1.  Half-year  gains,  an  objective  measure  of  school  progress. 

2.  The  I.E.R.  Arith.-Re.  Test,  an  intelligence  measure. 

3.  The  I.E.R.  General  Clerical  Test,  C-i,  weighted  with  the 

series  of  weights  used  in  the  Company  I  investigation,  3,  3, 
3,  10,  etc. 

4.  The  I.E.R.  Clerical  Test,  C-2,  weighted. 

5.  Haggerty  Delta  2  Intelligence  Test. 

6.  The  National  Intelligence  Test. 

The  intercorrelations  of  the  six  variables  by  the  five  age  groups 
are  shown  in  Table  I,  together  with  the  weighted  average  of  the 
five  in  each  compartment.  This  weighted  average  is  found  by 
weighting  the  correlations  of  age  15,  with  small  number  of  cases, 
1/2,  and  the  four  remaining  ages,  1  each.  At  the  foot  of  the 
table  is  shown  the  average  of  the  weighted  average  correlations  of 
the  respective  columns.  These  latter  figues  show  that  C-i, 
Haggerty,  and  N.I.T.  each  correlate  about  .60  on  the  average 
with  all  the  others,  while  Arith.-Re.,  weighted  C-2,  and  half-year 
gains  correlate  less  in  descending  order,  respectively.  Whereas 
Arith.-Re.  does  not  correlate  so  highly  (.56)  with  all  the  other 
intelligence  measures,  it  correlates  as  highly  with  Haggerty  (.72) 
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and  N.I.T.  (.69)  as  either  Haggerty  and  N.I.T.  correlates  with 
any  of  the  other  variables  save  Haggerty  with  N.I.T.  (.76). 
These  correlations  are  all  substantially  about  .70.  It  correlates 
.31  with  "half-year  gains,"  whereas  N.I.T.  correlates  only  .33 
and  Haggerty  .43. 

These  results  are  corroborated  by  a  study  of  the  correlations 
with  "half-year  gains"  and  "average  work,"  two  measures  of 
school  success  in  the  case  of  age  groups  of  all  boys  and  girls  on 
whom  these  data  were  available.  The  correlations  were  as 
follows: 


Correlation  of  I.E.R.  Arith.-Re.  Score 


Group 

No.  OF 
Cases 

With 
Half-Year 
Gains 

With 
Average 
Work 

1 2 -year  boys  

107 

•  58±.o4 

•  58±-04 

13-year  boys  

151 

59  ±04 

•39±05 

14-year  boys  

120 

74±°3 

•  38±.o5 

57 

.63^.05 

.23±.o8 

76 

•57±-05 

.68db.04 

120 

.56^.04 

•  57±04 

14-year  girls  

83 

.61  ±.05 

.6o±  .05 

39 

.24±  .10 

•  59=*=  .07 

CHAPTER  III 


TESTS  OF  ABILITY  WITH  THINGS  AND 
MECHANISMS:    BOYS'  TESTS 

We  have  experimented  in  various  ways  with  the  following  tests: 

1.  The  Stenquist  Assembly  Test  of  Mechanical  Ability,  devised 

by  Dr.  J.  L.  Stenquist  and  made  by  Stoelting  and  Co., 
referred  to  hereafter  as  the  Stenquist  Assembly. 

2.  The  I.E.R.  Assembly  Test  for  Girls. 

3.  The  Stenquist  Mechanical  Aptitude  Test  I,  and 

4.  The  Stenquist  Mechanical  Aptitude  Test  II,  referred  to  here- 

after as  the  Stenquist  Picture  Tests  I  and  II. 

5.  The  Thurstone  Manual  Training  Information  Test. 

6.  The  Army  General  Trade  Test. 

7.  The  Army  Mechanical  Interest  Test,  referred  to  sometimes  as 

the  M.I.T.  Test. 

Concerning  the  relative  merits  of  these  tests  in  various  respects 
we  ask  whether  they  measure  an  ability  or  group  of  abilities  that 
is  distinct  from  general  intelligence;  how  well  they  may  be  ex- 
pected to  predict  success  in  mechanical  work;  how  reliable  they 
are,  and  how  they  should  be  given  and  scored  to  obtain  satis- 
factory results  economically. 

The  Intercorrelations  of  Mechanical  Tests 

In  order  to  determine  the  intercorrelations  among  a  number  of 
so-called  mechanical  tests  used  by  different  research  workers,  the 
following  six  tests  were  administered  to  boys  of  the  ages  12,  13,  14 
and  15,  of  School  B:  The  Thurstone  Manual  Training  Test 
(after  deleting  some  twenty  items  bearing  on  machine  shop 
practice),  the  Stenquist  Assembly  Test,  the  Stenquist  Picture  I 
and  Picture  II,  the  Army  General  Trade  Test,  and  the  Army 
Mechanical  Interest  Test.  The  weighted  average  of  the  inter- 
correlations of  the  four  age  groups  was  computed  by  weighting 
age  15  one-half  (on  account  of  the  small  number  of  cases)  and 
ages  12,  13,  14,  one  each.  It  will  be  noted  in  the  "Wtd.  Av.  r" 
columns  of  Table  II  that  the  Stenquist  Assembly  correlates 
highest  with  the  Stenquist  Picture  I;  Stenquist  Picture  I  cor- 
relates most  highly  with  Stenquist  Picture  II,  but  only  to  the 
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TABLE  II 

Intekcorrelations  by  Age  Groups  of  Mechanical  Tests  for  145  Boys 
of  Public  School  B  * 


Manual 
Training 

Stenquist 
Assembly 

Stenquist 
Picture  I 

Stenquist 
Picture  II 

General 
Trade 

M 

I.T. 

Age 

AT 

r 

Wtd. 
Av.  rt 

r 

Wtd. 
Av.  rt 

r 

Wtd. 
Av.  rt 

r 

Wtd. 
Av.  rt 

r 

Wtd. 
Av.  rt 

r 

Wtd. 
Av  r1 

Manual 

.  17 

.20 

.36      . . 

.51 

•47 

12 

31 

Training 

.  12 

.  18 

.19  .20 

.19  .21 

.28  .40 

.22 

•  33 

13 

46 

23 

.15 

.05 

.32 

.  16 

14 

49 

.25 

.29 

.27 

.58  .. 

•  59 

15 

19 

.17 

.42 

.36  .. 

•  34 

•  35 

12 

Stenquist 

.  12 

.  18 

.42  .42 

•49  .36 

•34  -33 

.40 

.41 

13 

Assembly 

•  23 

.60      .  . 

.31 

.26  .. 

52 

14 

•  25 

.03 

.21 

•  45 

•34 

15 

.  20 

.42 

•4 

5 

.36  .. 

.38 

12 

Stenquist 

.  19 

.  20 

•  42 

.51  .56 

.28  .27 

55 

•  44 

13 

Picture  I 

•  IS 

.60 

.62  .. 

.32 

•  50 

14 

.29 

.03 

.71 

—  .02 

.25 

15 

.36 

.36 

.48  .. 

•  49 

•  52 

12 

Stenquist 

.19 

.21 

•  49 

.51  .56 

•49  .33 

•47 

.46 

13 

Picture  II 

■  OS 

.31 

.62  .. 

.21 

•43 

14 

.27 

.21 

.71 

—  .04 

•  37 

15 

•  Si 

•  34 

.36  .. 

•  49 

.70 

12 

General 

.28 

.40 

•  34 

.28  .27 

•49  -33 

•  30 

•  50 

13 

Trade 

32 

.26 

.32      . . 

.21 

.41 

14 

.58 

•  45 

—  .02 

—  .04 

.66 

15 

•  47 

•  35 

.38  .. 

.52 

.70 

12 

M.I.T. 

.22 

•  33 

.40 

•55  -44 

•47  .46 

.30  .50 

13 

.16 

•  52 

.50 

•  43 

.41 

14 

•  59 

•  34 

.25 

•  37 

.66  .. 

15 

Av.ofWtd. 

Av.  r's 

.26 

.34 

.  .38 

.  .38 

.  -37 

•43 

45-8 

28.5 

32.5 

32.3 

4 

7 

19-7 

12 

Averages 

45-2 

353 

37-5 

337 

5 

4 

20.7 

13 

46.0 

47-0 

41-9 

37-1 

7 

0 

21.2 

14 

41 

5-4 

52.4 

36.5 

333 

7 

4 

22.8 

15 

6.23 

1325 

9. 10 

I.03 

3 

19 

7.00 

12 

Standard 

4-57 

1 

S.26 

9-74 

9.10 

2 

72 

5-94 

13 

Deviations 

7-57 

20.00 

12.84 

1.66 

3 

99 

6.69 

14 

5.81 

19.86 

10.62 

998 

6.69 

6.46 

15 

t  All  ages  weighted  i  except  age  15,  which  was  weighted  J. 

*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  II  may  be  found  from  the 
following  table: 


Ability  With  Things  and  Mechanisms:  Boys'  Tests 


19 


P.E.r  When  r  Is: 


N 

0 

± .  1 

±  .2 

±3 

*.4 

* -5 

±.6 

±  .7 

±  .8 

=*=.9 

19 

•  IS 

•  15 

•  IS 

.14 

•  13 

.  12 

.  10 

.08 

.06 

.03 

31 

.  12 

.  12 

.12 

.11 

.  10 

.09 

.08 

.06 

.04 

.02 

46 

.  10 

.  10 

.  10 

.09 

.08 

•  07 

.06 

•  os 

.04 

.02 

49 

.  10 

.  10 

.09 

.09 

.08 

.07 

.06 

•  os 

03 

.02 

extent  of  .56;  General  Trade  Test  correlates  most  highly  with 
Mechanical  Interest  Test  (knowledge  of  the  use  of  tools),  .50,  and 
next  highest  with  the  Thurstone  Manual  Training  Test  (true-false 
questions  about  the  use  of  tools  and  the  properties  of  materials) ; 
the  Thurstone  Manual  Training  Test  correlates  highest  with  the 
General  Trade  Test  (recall  questions  about  general  mechanical 
information),  .40.  The  average  of  these  weighted  average  corre- 
lations, in  the  bottom  row  of  the  table,  shows  that  the  average 
intercorrelation  of  each  of  these  tests  with  each  of  the  others  is 
about  .40,  save  in  the  case  of  the  Thurstone  Manual  Training 
Test,  which  correlates  somewhat  lower.  This  average  intercorre- 
lation of  .40  for  a  mechanical  test  devised  by  various  investigators 
is  to  be  compared  with  the  results  of  Table  I  which  show  that  the 
average  intercorrelation  of  a  number  of  tests  which  are  known  to 
correlate  rather  well  with  general  intelligence  is  about  .60  for 
similar  age  groups  of  girls.  This  rather  lower  intercorrelation 
may  indicate  a  greater  confusion  in  the  minds  of  the  builders  of 
mechanical  tests  in  regard  to  what  they  are  attempting  to  measure 
than  in  the  case  of  the  builders  of  intelligence  tests;  or,  it  may  be 
merely  the  result  of  the  lack  of  sufficient  mechanical  environment 
in  the  case  of  these  boys  to  bring  out  their  mechanical  potentiali- 
ties; or  lower  intercorrelations  of  "mechanical"  tests  may  be  the 
essential  nature  of  such  tests.  In  various  other  tests  it  has  been 
found  that  practice  improves  the  correlation  between  functions. 
It  seems  quite  likely,  once  these  boys  are  subjected  to  a  more 
complicated  mechanical  environment,  that  the  size  of  the  corre- 
lations between  these  mechanical  tests  will  increase.  The  me- 
chanical environment  of  a  New  York  City  boy  is  quite  limited 
compared  to  that  of  a  country  boy,  or  one  living  in  a  small  town. 

In  Table  III,  the  averages  and  standard  deviations  on  the 
Stenquist  Picture  I,  Stenquist  Picture  II,  and  Stenquist  Assembly 
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Test  are  compared  for  19  fifteen-year-old  New  York  City  boys  of 
Public  School  B,  and  145  first-year  high  school  boys  of  High 
School  R.  School  R  is  located  in  a  small-sized  city  noted  for  the 
excellence  of  its  machine  shop  products;  each  boy  in  the  eighth 
grade  had  taken  prevocational  work  in  mechanical  courses. 

TABLE  III 


Averages  and  Standard  Deviations  of  Scores  on  Three  Mechanical 
Tests  Made  by  145  School  R  First- Year  High  School  Boys  and 
by  19  Public  School  15-Year-Old  New  York  City  Boys 


Test 

High  School  R, 
First-Yr.  H.  S. 
Boys 

P.  S.  B,  N.  Y.  C, 
15-YR.-OLD 

Boys 

Difference  of 
the  Averages  in 
Favor  of  High 
School  Boys 

Aver. 

S.  D. 

Aver. 

S.  D. 

Sten.  Pict.  I  

57-2 

18.7 

36.5 

10.6 

20.7 

Sten.  Pict.  II  

52.0 

11  .8 

33-3 

10. 0 

18.7 

Sten.  Assembly .... 

71 .2 

16.8 

52.4 

19.9 

18.8 

Such  differences  as  shown  surely  cannot  be  due  to  chance.  It 
is  obviously  impossible  to  state  how  much  of  the  superiority  of  the 
small  city  first-year  high  school  boys  on  all  of  the  three  tests  is 
due  to  different  native  capacity,  different  specific  training, 
specific  selection,  or  richer  mechanical  environment. 

There  are  also  available  for  comparison  the  intercorrelations  of 
the  General  Trade  Test,  the  Mechanical  Interest  Test,  the  Sten- 
quist  Picture  I  and  Picture  II,  and  the  Stenquist  Assembly  Test  in 
the  case  of  the  R  High  School  boys  and  the  entire  group  of  New 
York  City  public  school  boys.  These  intercorrelations  are  shown 
in  Table  IV,  where  the  correlations  for  the  New  York  City  boys 
are  given  as  the  weighted  average  of  the  intercorrelations  of  the 
four  age  groups,  12,  13,  14  and  15. 

The  intercorrelations  of  the  R  High  School  first-year  pupils  and 
the  intercorrelations  of  the  larger  group  of  the  same  pupils  in  the 
preceding  year  in  the  eighth  grade  prevocational  school,  are 
higher  for  the  most  part  than  the  corresponding  intercorrelations 
of  the  several  tests  in  the  case  of  boys  in  New  York  City  public 
schools. 

It  is  obviously  impossible  to  tell  how  much  of  this  difference  is 
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TABLE  IV 

Average  Intercorrelations  of  Mechanical  Tests  for  Four  Groups* 


Gen. 
Tr. 


M.I.T. 


Sten. 
I 


Sten. 
II 


Sten. 
Comb. 


Sten. 
Assem. 


R.  Prevocational. 
R.  High  School. 
N.  Y.  C.  Girls .  . 
N.  Y.  C.  Boys.  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls .  . 
N.  Y.  C.  Boys.  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls .  . 
N.  Y.  C.  Boys.  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls .  . 
N.  Y.  C.  Boys .  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls.  . 
N.  Y.  C.  Boys .  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls.  . 
N.  Y.  C.  Boys.  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls.  . 
N.  Y.  C.  Boys .  . 

R.  Prevocational 
R.  High  School. . 
N.  Y.  C.  Girls.  . 
N.  Y.  C.  Boys .  . 


.70 
.50 


•  31 
.27 


SO 


31 
.27 


•  45 

•  36 


■  17 

•  33 

.24 
.41 

.40 
.42 

45 

.36 


42 


*  With  a  variable  number  of  cases  in  the  case  of  the  New  York  City  boys  and  girls.  The  P.E.f 
can  be  computed  for  the  R  Prevocational  Group  and  R  High  School  Group  by  means  of  the 
following  table: 


P.E.r  When  r  Is  of  the  Value 


O.O 

±  .  2 

±3 

*.s 

=*=  -7 

=*.8 

US 

.06 

.06 

os 

•  os 

•  os 

.04 

•  04 

•  03 

.02 

.01 

208 

•05 

•05 

•  OS 

.04 

.04 

.04 

•  03 

.02 

.02 

.01 

3 
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due  to  difference  in  range  of  ability  involved  and  how  much  to 
other  causes.  Practically  all  of  the  intercorrelations  are  in  the 
neighborhood  of  .25  to  .50.  This  gives  some  hope  that  a  com- 
bination of  these  tests  may  predict  mechanical  ability  considerably 
better  than  any  single  one  or  two  of  them.  Indeed,  the  low  re- 
liability of  the  Stenquist  (rn  =  .60  in  grade  groups)  indicates  that 
in  order  to  have  a  highly  reliable  mechanical  scale  we  must  either 
lengthen  the  time  limits  on  the  present  Stenquist  Test  or  else  add 
to  it  tests  of  other  abilities  which  are  important  in  predicting 
mechanical  aptitude.  As  a  general  policy,  it  is  usually  more 
promising  to  add  other  tests  with  different  content  than  to 
lengthen  the  present  test;  this  is  especially  true  if  the  second  test 
measures  a  unique  but  important  element  of  mechanical  aptitude. 
Such  a  test  can  be  recognized,  first,  by  its  high  correlation  with 
an  adequate  mechanical  criterion,  and  second,  by  its  low  corre- 
lation with  the  present  test  or  combination  of  tests. 

When  an  adequate  criterion  is  available,  it  can  readily  be 
determined  whether  the  addition  of  any  other  test  or  the  lengthen- 
ing of  the  present  scale  to  twice  its  present  length  will  yield  the 
higher  multiple  correlation  coefficient  of  the  revised  combination 
with  the  criterion. 

In  this  table  is  shown  also,  the  average  correlations  for  the  New 
York  City  girls  of  the  Stenquist  Assembly  with  the  I.E.R.  Girls 
Assembly  tests  (42).  The  average  correlations  between  a 
measure  of  intelligence  and  the  several  tests  is  also  shown.  These 
are  for  the  most  part  quite  low,  all  being  about  .30  or  below  save 
in  the  case  of  the  combined  Stenquist  Picture  and  the  General 
Trade  Tests. 

Correlation  of  Mechanical  Interest  Test  and  General 

Trade  Test 

Through  the  cooperation  of  Dr.  H.  S.  Hollingworth,  of  Colum- 
bia University,  the  Mechanical  Interest  Test  and  General  Trade 
Test  were  administered  to  31  Columbia  students  who  took  both 
forms.  The  correlation  between  General  Trade  and  M.I.T.  is 
.885  ±.03.  This  correlation  is  much  higher  than  ordinarily  is 
obtained  between  the  General  Trade  and  M.I.T.  in  various  other 
groups  which  have  been  tested  by  these  tests.  This  points  to  a 
possible  hypothesis  that  mechanical  tests  for  these  academic 


TABLE  V.    INTERCORRELATIONS  OF  TESTS  AND  VARIABLES,  BY  AGE  GROUPS,  IN  THE  CASE  OF  435  BOYS  OF  PUBLIC  SCHOOL  B,  AND  3,8  GIRLS  OF  PUBLIC  SCHOOL  G* 


Average 

Half 

Year 

.-Re.) 

C-I 

Stenquist 

ioXAverage 

iox  Average 

HEA 

Thorndike 

Thorndike 

Conduct 

Work 

Arithmetic 

o.NG 

2  Gross 

ClEE 

Semester 

Weighted 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

Boys 

Girls 

-.26 

+  .02 

-.22 

+  .04 

—.19 

+  .09 

+  .03 

—  .14 

-.24 

+  .12 

-.27 

+  .03 

—  23 

—.07 

+  .12 

—  .19 

-.07 

—  .18 

-.23 

—.21 

—.15 

—.21 

—  .24 

—.23 

—  .20 

—  .04 

—.19 

—  .16 

+  .14 

-.22 

-.06 

—  .16 

-.02 

+  .04 

+  .05 

—.03 

-.02 

+  .26 

+  .06 

+  ■12 

-.27 

—.36 

—  .28 

—  .22 

-.07 

+  .05 

+  .13 

—  .11 

+  .43 

+  .39 

.58 

.57 

.62 

.67 

-.02 

.22 

•45 

.57 

.52 

.76 

S3 

.49 

.56 

.54 

.42 

.61 

-  18 

—  .18 

.59 

.56 

.69 

.55 

—  .01 

.27 

.36 

.38 

.50 

.76 

'51 

.51 

-.11 

■  74 

.78 

.44 

.05 

.54 

.60 

.24 

.45 

.63 

.24 

.6s 

.40 

.29 

.17 

.19 

.37 

.18 

.56 

■  ss 

.11 

.62 

.34 

.03 

.17 

.58 

.57 

'  -75 

.65 

.17 

.18 

■  45 

.29 

.58 

.68 

90 

.87 

.87 

.89 

.37 

.57 

—  02 

—.23 

.56 

■  77 

.29 

.18 

■  39 

.57 

.94 

.89 

.94 

-.04 

—  .21 

■  74 

.61 

.82 

.75 

.17 

.18 

.28 

.38 

.60 

'91 

.90 

•  91 

.88 

.46 

.69 

.63 

.24 

.81 

.76 

.26 

.06 

.22 

.42 

.23 

.59 

•  94 

.91 

■  95 

.91 

.57 

.62 

.67 

.75 

.65 

.19 

.23 

■  45 

.44 

•S4 

.59 

•71 

.54 

V  .64 

.62 

.60 

-'■Z 

—  .21 

.69 

.55 

.88 

.29 

.19 

.55 

.w 

.83 

.84 

85 

—.19 

—.21 

.78 

.44 

.82 

.75 

•  37 

.62 

■  T. 

.74 

M 

.81 

—  .02 

+  .26 

.6s 

.40 

.81 

.76 

.19 

.21 

.IS 

.34 

.28 

.51 

.7- 

.64 

.82 

.77 

.71 

.79 

-.02 

.22 

.17 

.18 

.19 

.23 

.01 

.06 

.09 

.23 

23 

.19 

.05 

.19 

.36 

-  07 

-.15 

—.01 

.27 

.10 

.29 

.03 

.29 

—  .06 

.03 

.06 

.09 

15 

.28 

.27 

j1 

-.16 

+  .14 

.15 

.05 

.18 

.26 

.04 

.08 

.26 

.19 

.26 

+  .06 

+  .12 

.29 

.17 

.26 

.06 

.19 

.21 

-.02 

—.27 

-.06 

—  09 

■  31 

.10 

.20 

.03 

.05 

—  .01 

-.It 

■  45 

.57 

.45 

.29 

■45 

44 

.01 

.06 

:.67 

.57 

3, 

.21 

■  45 

.33 

.18 

.29 

—.21 

.36 

.38 

.21 

.18 

.32 

.19 

—  .06 

.03 

■54 

.38 

.13 

.18 

.26 

-.10 

-  10 

.33 

.34 

.37 

.11 

.04 

.13 

.20 

—.10 

.12 

.19 

.37 

.22 

.42 

.15 

.34 

-.02 

—.27 

■57 

.72 

.22 

.35 

.17 

.45 

.13 

.19 

.52 

.76 

.58 

.68 

■54 

.59 

.09 

.23 

.67 

.57 

■  54 

.58 

.52 

.65 

.30 

.52 

Z'H 

-.24 

.76 

57 

.55 

.06 

■  54 

.56 

■5? 

-.20 

-.36 

•  38 

.60 

.48 

.49 

.09 

.67 

.31 

.50 

.45 

—  .28 

—.22 

.18 

.56 

.23 

.59 

.28 

.51 

—  .06 

—.09 

.57 

.72 

.21 

.55 

.23 

.57 

.26 

.34 

.53 

.49 

.90 

.87 

•  71 

.54 

.33 

.19 

36 

.21 

■54 

.58 

.58 

.59 

-lb 

—.23 

.51 

.53 

■  91 

.94 

.66 

.83 

.15 

.28 

.  .13 

.18 

■34 

.56 

.79 

6 

-.02 

—.17 

.70 

.54 

.08 

.13 

.50 

.60 

.40 

.60 

.58 

.11 

•94 

.91 

.72 

.64 

.31 

.10 

.22 

.35 

.21 

.55 

.68 

.50 

.54 

.56 

.54 

.87 

.89 

.64 

.62 

.05 

.19 

•45 

.33 

•  S2 

.65 

.62 

.58 

.29 

.47 

+  .Z 

—.20 

.60 

.54 

.89 

.94 

■75 

.27 

.52 

.79 

.65 

.73 

-.03 

-.22 

.67 

.60 

91 

.88 

.74 

.26 

.20 

.35 

.62 

.60 

.65 

.62 

.34 

■  95 

.91 

.82 

.77 

.20 

.03 

■  17 

.45 

.23 

.57 

.81 

.68 

.42 

.37 

.60 

.35 

.18 

.30 

.41 

.29 

-  26 

.44 

.80 

.31 

.20 

.42 

.65 

.65 

91 

—  .06 

.24 

.46 

—.10 

.28 

.40 

.43 

.83 

+  .43 

.03 

.57 

.71 

.05 

.13 

.26 

.50 

.54 

.91 

.61 

.57 

.78 

.36 

.29 

.52 

.59 

.47 

.84 

-.31 

.50 

.76 

.85 

.25 

.22 

.50 

.72 

.73 

.81 

.26 

.12 

.45 

.65 

.83 

+  .39 

.17 

.63 

.79 

-.01 

.19 

.34 

.54 

.61 

.91 

—.16 

.48 

45 

.74 

.28 

.28 

.39 

.45 

.39 

.83 

.45 

.72 

.80 

.27 

.21 

.46 

.70 

.68 

.96 

—  .13 

.32 

.51 

.19 

.08 

.31 

.42 

.51 

.86 

.12 

.54 

.75 

.00 

.18 

.32 

.44 

.55 

.95 

.93 

.26 

48 

.09 

.29 

.32 

.24 

.35 

.46 

.51 

.24 

.29 

.43 

-.17 

.34 

.38 

.21 

.21 

.37 

+  .32 

.25 

.19 

.47 

.31 

.07 

.17 

.28 

.39 

.51 

Iso 

'f5 

.63 

.35 

.li 

[11 

.48 

.60 

.62 

■44 

+.09 

.61 

.63 

•  47 

.28 

•  57 

12.58 

7.47 

0.21 

72.48 

74.93 

30.36 

37.26 

20.41 

SO.  40 

54.96 

41.82 

44.78 

8.76 

48.16 

48.60 

164.50 

288.46 

11.85 

— 3.0i 

-1.17 

66  64 

2S.3I 

32.21 

38.42 

20.79 

45.  SO 

53.55 

33-19 

39  12 

6.85 

6.97 

45.73 

154.00 

255.49 

10  62 

-2.20 

68.11 

25.58 

33.  14 

24.59 

■  1-  .9b 

51.64 

32.78 

.35.55 

7.08 

7.25 

45  98 

165.31 

266.08 

10  38 

—5-95 

—2  72 

58.8- 

65  64 

28.67 

51.00 

32.79 

33.62 

5.91 

7.56 

40.73 

142.76 

220.66 

3.6! 

1-45 

1.69 

16.92 

15.72 

12.90 

18.85 

13.17 

6.32 

7.00 

316 

9.26 

35.72 

5.3 

2.4 

18.61 

19.17 

15  99 

13.28 

6.92 

3.38 

3.46 

9  76 

99  34 

6.7S 

6.1 

2.67 

.  2.2 

17.85 

18  50 

14  32 

19.65 

12.45 

7.07 

8.4( 

3.26 

3.49 

9-89 

38.14 

6.6. 

6.2 

2.9S 

2.34 

21.28 

13.27 

19.41 

16.80 

9.40 

7.23 

351 

3.29 

12.64 

44.15 

30 

Thorndike 
Clerical 
Weighted 


P.E.r  When  r  Is: 


Ability  With  Things  and  Mechanisms:  Boys1  Tests  23 

students  in  all  probability  function  more  as  intelligence  tests  than 
as  mechanical  tests.  The  correlation  of  .885  in  a  university  group 
indicates  a  very  high  relationship.  The  average  score  on  the 
M.I.T.  was  60  points  and  on  the  General  Trade  56  points.  These 
averages  are  to  be  compared  with  the  corresponding  averages  in 
the  case  of  the  Camp  Grant  soldiers,  which  show  that  the 
university  students  make  higher  averages  on  both  tests  than  do 
the  soldiers;  the  greater  difference  in  favor  of  the  university 
students  being  in  the  case  of  the  General  Trade  Test  (mechanical 
information),  a  test  which  is  known  to  correlate  slightly  higher 
(in  the  case  of  soldiers)  with  intelligence  than  does  the  M.I.T. 
(knowledge  of  the  use  of  tools). 


M.I.T. 

Gen.  Tr. 

N 

Camp  Grant  Mx  

51 

39 

240 

°x-  

17 

21 

60 

56 

31 

*x  

19 

26 

The  Correlations  between  Tests  of  Ability  with  Things 
and  Mechanisms  and  Tests  of  Ability  with  Ideas  and 
Symbols 

Taking  boys  and  girls  as  we  find  them  in  the  public  schools, 
"ability  with  things"  is  notably  distinct  from  "ability  with  ideas 
and  symbols."  How  distinct  these  abilities  are  in  the  original 
inborn  constitution  of  man  we  do  not  know.  They  may  become 
more  and  more  divorced  by  circumstances  of  life  which  give 
certain  individuals  much  practice  with  things  and  little  with  ideas 
and  symbols  and  vice  versa.  As  things  are,  however,  we  get 
information  about  a  new  and  large  fraction  of  human  ability  when 
we  add  such  a  score  as  that  in  the  Stenquist  Assembly  or  the 
I.E.R.  Assembly  to  a  pupil's  school  record  and  score  in  intelli- 
gence tests. 

To  the  existing  evidence  for  the  distinctness  of  these  abilities 
our  investigation  adds  the  following: 

In  the  general  summary  table  of  intercorrelations  (Table  V)  we 
show,  along  with  many  other  facts,  results  for  435  boys  and  318 
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girls  in  the  case  of  Half- Year  Gains  in  School,  Average  Conduct 
in  School,  and  Average  Work  in  School,  as  related  to  score  made 
in  the  Assembly  Tests  and  to  score  made  in  the  Arith.-Re.  Test, 
and  also  the  relation  between  the  Assembly  Test  score  and  the 
Arith.-Re.  Test  score.  The  Assembly  Test  ability  is  clearly 
differentiated,  especially  in  boys.  Most  of  the  correlations  are 
below  .25. 

With  the  assistance  of  Dr.  L.  J.  O'Rourke,  the  Stenquist  As- 
sembly Test  was  given  to  145  high  school  boys  in  City  R  who  had 
been  tested  with  Army  Alpha  a  year  previously  when  in  pre- 
vocational  classes  in  the  eighth  grade.  The  correlation  between 
Stenquist  Assembly  and  Alpha  was  only  .14=^.05. 

Through  the  kindness  of  Dr.  H.  A.  Ruger,  we  have  records  for 
82  adults  in  a  three-hour  intelligence  test  and  also  in  the  Stenquist 
Assembly  Test.  The  average  correlation  is  only  .24  for  men  and 
.13  for  women,  as  shown  in  Table  VI. 

TABLE  VI 

The  Correlation  between  Ability  with  Things  and  Ability  with 
Ideas:  Correlations  of  Thorndike  College  Entrance  Intelli- 
gence Test  and  Stenquist  Assembly  Test 


A.  UNIVERSITY 

WINTER 

GROUP 

Group 

P.E.r 

N 

Stenquist 

r 

Av. 

a 

Both  sexes  

.18 

dz.IO 

41 

28 

60 

21 

•15 
.41 

±.12 

53 
74 

20 

±.16 

13 

14 

B. 

UNIVERSITY  SUMMER  SESSION  GROUP 

Group 

r 

P.E.r 

N 

Stenquist 

Av. 

a 

.06 

rfc.IO 

4i 
25 
16 

7i 

29 
19 

.11 

±•13 

56 
62 

Men  

.06 

=b.I7 
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The  Prediction  of  Shop  Ranks  from  the  Stenquist  Assembly, 
the  Stenquist  Picture  I  and  II,  and  Intelligence,  When 
the  Criterion  Correlations  Are  Estimated 

At  the  beginning  of  the  experiments  with  mechanical  tests 
conducted  by  the  Institute,  it  became  desirable  to  form  some 
conclusions  as  to  whether  or  not  a  weighted  composite  of  certain 
other  so-called  mechanical  tests  would  predict  shop  ranks  equally 
as  well  as  the  Stenquist  Mechanical  Assembly  Tests.  If,  for 
instance,  three  paper  tests  properly  weighted  would  correlate  as 
well  with  shop  ranks  as  does  the  Stenquist  Mechanical  Test,  it 
would  be  useless  to  give  the  longer,  more  expensive  assembly  test 
unless  the  people  affected  for  practical  diagnosis  by  the  ratings 
therefrom  were  markedly  different  people  in  the  two  cases  re- 
spectively. We  should  emphasize  this  last  point  for  it  is  theoret- 
ically possible  that,  with  a  correlation  with  shop  ranks  of  .71  or 
less  in  the  case  of  each  of  two  different  tests  respectively,  a 
"genius"  on  the  one  test  may  be  rated  "idiot"  on  the  other  test 
and  vice  versa. 

At  the  time  there  were  available  only  fragmentary  ones  of  the 
correlations  needed,  these  being  found  in  the  work  to  be  published 
in  the  doctorate  dissertation  of  J.  L.  Stenquist.  The  intercorre- 
lations  of  the  various  tests  had  not  been  obtained  consistently  on 
any  one  group,  although  they  had  been  obtained  in  certain 
instances  for  practically  all  of  the  tests  on  various  groups. 

In  the  absence  of  the  exact  correlations  it  was  determined  to 
resort  to  an  approximation  by  having  three  judges  estimate  the 
desired  correlation  coefficients  after  having  carefully  read  Dr. 
Stenquist's  manuscript  copy  of  the  report  of  his  work.  These 
correlations  were  estimated  by  Dr.  Thorndike,  Dr.  Stenquist,  and 
Dr.  Toops,  and  appear  in  Table  VII.  The  fifth  figure  in  each 
compartment  of  this  table  represents  the  average  intercorrelations 
later  found  actually  to  exist  in  age  groups  12  to  15  inclusive.  The 
intercorrelations  of  the  tests  are  quite  low  with  respect  to  the 
criterion  correlations  of  the  first  row  of  the  table,  since  the 
criterion  correlations  are  probably  too  high  for  an  age  group  and 
are  estimated  rather  for  the  population  at  large. 
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TABLE  VII 

Summary  of  Intercorrelations  of  Tests  as  Estimated 
by  Three  Judges 


Judge 


Th  

St  

To  

Compromise  

Th  

St  

To  

Compromise  

Observed  Average . 

Th  

St  

To  

Compromise  

Observed  Average , 

Th  

St  

To  

Compromise  

Observed  Average 

Th  

St  

To  

Compromise  

Observed  Average 


Shop 
Ranks 


•55 
.60 
.62 
.60 


•55 
.60 
.64 
.60 


■45 
■45 
■23 
•45 


75 
65 
76 
75 


Sten- 

QUIST 
PlCT.  I 


•55 
.60 
.62 
.60 


85 
60 

78 
75 

56 

55 
40 
52 
55 

55 

75 
60 
70 
70 

42 


Sten- 

QUIST 
PlCT.  II 


55 
60 
64 
60 

85 
60 
78 
75 

56 


•55 
65 
.64 
•65 

.60 

65 

■50 
.66 

65 
36 


Intel- 
ligence 


45 
45 
23 
45 

55 
40 
52 
55 

55 

55 
65 
64 
65 

.60 


■45 
30 
25 
45 


By  the  method  of  multiple  ratio  correlation,  Stenquist  Picture 
I  and  Picture  II  and  Intelligence  were  combined  to  predict  as 
best  they  would  the  shop  ranks  in  the  case,  respectively,  of  the 
estimates  of  Drs.  Thorndike,  Stenquist  and  Toops,  and  also  in  the 
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compromise  or  modal  correlation  system,  and  finally  in  the  case 
of  the  true  age  intercorrelations  using  the  criterion  correlations  of 
the  compromise  estimate.  The  results  are  shown  in  Table  VIII. 
The  multiple  ratio  correlation  coefficient  for  the  composite  of 
three  paper  tests  (Column  /)  was  less  than  the  correlation  of 
Stenquist  Assembly  with  the  criterion  in  all  cases  save  in  the  case 
of  Dr.  Stenquist's  estimates.  The  material  for  computing  the 
true  intercorrelations  of  tests  was  not  available  previous  to  the 
time  of  writing  this  report,  but  substantiate  the  results  found  by 
the  estimated  intercorrelations,  provided  it  be  granted  that  the 
criterion  correlations  are  of  the  proper  relative  magnitudes. 

We  may  conclude  then,  so  far  as  the  evidence  goes,  that  the 
composite  of  the  three  paper  tests  is  somewhat  inferior  in  terms  of 
the  multiple  ratio  correlation  coefficient  to  the  Stenquist  Mechani- 
cal Assembly  Test  alone.  The  final  test  of  this  fact  will,  of  course, 
come  when  a  valid  mechanical  criterion  becomes  available  and  all 
the  tests  are  administered  to  a  large  group  of  subjects.  We  thus 
are  reasonably  certain  that  the  Stenquist  Assembly  Test  is  to  date 
the  most  important  single  test  contribution  to  the  measurement  of 
general  mechanical  ability.  Even  if  the  three-test  composite 
correlated  as  highly  with  shop  ranks  as  does  the  Stenquist 
Assembly  Test,  we  might  yet  be  justified  in  selecting  the  Stenquist 
Test  in  preference  to  the  three-test  composite  for  the  reason  that 
Stenquist  Test  correlates  low  with  intelligence,  whereas  in  age 
groups  the  combined  Picture  Test  alone  correlates  in  the  neighbor- 
hood of  .60  with  intelligence.  Thus  the  "failures"  on  the  intelli- 
gence test  will  by  no  means  necessarily  be  the  failures  on  the 
Stenquist  Assembly  Test,  although  the  failures  on  the  intelli- 
gence test  will  for  the  most  part  be  the  failures  on  the  com- 
bined Stenquist  Picture  Tests  because  of  the  high  correlation 
between  the  latter  two. 

A  composite  of  the  Stenquist  Assembly  Test  with  such  other 
mechanical  tests  as  we  now  have  available  (Column  g)  will  prob- 
ably predict  shop  ranks  considerably  better  than  the  Stenquist 
test  alone  (Column  d).  This  is  especially  true  if  we  are  trying  to 
predict  rates  of  learning  a  mechanical  process,  or  length  of  time 
needed  to  acquire  a  given  amount  of  trade  proficiency. 

With  the  assumption  of  the  intercorrelations  involved  in  the 
true  age  intercorrelation  group,  when  Picture  II,  Picture  I,  and 
Intelligence  in  turn,  in  order  of  decreasing  amounts  of  contribution 
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to  the  multiple  ratio  correlation,  are  added  to  the  Assembly  Test 
to  make  up  a  composite  scale,  the  multiple  ratio  correlation  co- 
efficient is  .85.  In  this  case  the  relative  weights  (/3),  which  are  to 
be  divided  by  their  standard  deviations  in  order  to  determine  the 
gross  score  weights,  are  as  follows: 


The  added  practical  effectiveness  of  a  scale  which  would  cor- 
relate .85  with  shop  ranks,  as  compared  with  the  Stenquist 
Assembly  Test  which  correlates  .75  with  shop  ranks  as  given  in 
the  table,  is  well  worth  the  effort  required  to  obtain  it.  The 
standard  error  of  estimate  of  the  composite  scale  is  then  (r  =  .85) 
only  .53  of  the  standard  deviation  of  shop  ranks. 

As  a  matter  of  fact,  the  universally  low  intercorrelations  among 
the  so-called  mechanical  tests  used  in  the  past  invites  our  atten- 
tion to  the  desirability  of  combining  many  of  these  into  a  com- 
posite scale  in  order  to  predict  mechanical  ability.  Mechanical 
ability  is  probably  quite  as  "general"  as  general  intelligence.  On 
the  other  hand,  we  have  shown  in  this  investigation  that  general 
intelligence  can  be  adequately  measured  by  two  or  more  of  the 
standard  school  tests  provided  we  are  primarily  interested  in  the 
aspect  of  the  correlation  of  the  test  with  a  criterion  of  ability  to 
progress  in  school.  In  the  same  way  it  may  be  found  that  one  or 
two  rather  basic  mechanical  tests  will  give  practically  as  good 
prediction  of  mechanical  capacity  as  the  two  educational  tests  do 
in  predicting  school  progress  ability.  This  seems  reasonable 
from  the  fact  that,  so  far  as  tried  out,  none  of  the  paper  mechani- 
cal tests  give  very  high  correlations  with  an  adequate  criterion. 
A  composite  scale  requires  tests  which  shall  correlate  low  among 
themselves  relative  to  the  correlations  with  the  criterion.  The 
current  mechanical  tests  fulfill  the  first  of  these  requirements, 
low  correlations  among  themselves,  but  probably  do  not  fulfill  so 
well  the  second,  high  correlations  with  a  criterion. 

Two  of  the  promising  types  of  mechanical  tests  for  such  a 
combination  would  seem  to  be  an  assembly  test  without  a  model, 
the  product  not  being  predetermined,  such  a  test  as  is  found  in 
the  Stenquist  Assembly  Test,  and  a  test  of  imitating  a  model, 


Test 


/3  Weight 


Assembly 
Picture  II 
Picture  I . 


1 .000 


.618 

•359 
.128 


Intelligence 
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such  as  the  I.E.R.  Girls  Assembly  Test.  These  tests  correlate 
only  about  .40  with  each  other  in  the  case  of  age  groups,  while, 
presumably,  both  correlate  fairly  well  with  an  adequate  criterion. 
More  adequate  predictions  than  are  now  possible  can  readily  be 
secured  by  multiplying  the  length  of  any  of  the  forms  of  the 
present  mechanical  tests.  The  increase  in  reliability  of  the 
Stenquist  test  to  be  obtained  by  doubling  the  number  of  test 
models  has  been  mentioned  in  another  place.  The  reliability  of 
the  I.E.R.  Assembly  is  not  known  at  the  present  time,  but  this 
test  likewise  will  increase  both  in  reliability  and  validity  if  given 
over  a  longer  period  of  time  with  more  models.  In  all  test  work 
the  implicit  assumption  is  that  the  few  tasks  which  the  subjects 
attempt  in  the  short  period  of  time  are,  if  the  test  is  a  good  test, 
a  good  sampling  of  the  sum  total  of  reactions  of  which  the  in- 
dividual is  capable,  called  his  "  ability."  If  fifteen  minutes' 
sampling  of  such  abilities  is  but  a  poor  sampling  of  the  total  of 
such  abilities,  we  have  but  to  increase  the  test  to  thirty  minutes, 
forty-five  minutes,  one  day,  three  days,  three  months,  three  years, 
or  what  not,  or  until  we  have  had  the  individual  actually  perform 
all  of  the  reactions  of  which  he  is  capable,  whereupon,  theoreti- 
cally, we  would  have  a  perfect  measure  of  the  ability  which  we  are 
trying  to  measure.  The  good  test  is  that  one  which  will  corre- 
late highly  with  the  sum  total  of  such  abilities  when  the  test  is 
administered  in  a  short,  that  is  "practical,"  amount  of  time  and 
with  a  minimum  of  scoring  and  other  administrative  labor.  This 
criterion  is  not  a  practical  one  which  can  be  statistically  approxi- 
mated; hence  we  must  rely  on  securing  an  adequate  criterion. 
The  perfect  test  of  mechanical  ability  would  be  one  which  would 
cover  many  weeks,  the  individual  being  required  in  every  suc- 
cessive hour  of  the  time  to  do  new  mechanical  tasks  and  to  be 
objectively  scored  upon  each  task  in  turn.  Stenquist  has  at- 
tempted to  abbreviate  this  process  to  one-half  hour.  It  is  the 
writer's  conviction,  based  upon  known  testing  principles,  that  the 
time  should  be  at  least  doubled.  And  even  two  hours  is  little 
enough  time  for  a  person  to  employ  in  determining  his  mechanical 
capacity.  A  test  of  the  Stenquist  type  which  would  involve  four 
times  as  many  items  would  by  no  means  require  four  times  as  long 
to  score.  When  given  in  groups,  the  extra  time  per  person 
required  for  administration  of  the  test  is  negligible;  the  only 
increases  worth  considering  at  any  length  are  the  increases  in  time 
and  labor  required  in  the  scoring,  and  the  cost  of  test  materials. 
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The  Prediction  of  Success  in  Prevocational  Courses  in 
Carpentry,  Printing,  Forging,  and  the  Like,  by  Various 
Tests 

During  the  winter  of  1920  and  1921,  a  preliminary  investiga- 
tion of  the  talents  possessed  by  208  prevocational  boys  in  the 
Public  School  R,  eighth  grade,  was  conducted  by  Dr.  Toops  and 
Dr.  O'Rourke.  The  army  form  of  the  General  Trade  Test,  the 
Mechanical  Interest  Test,  and  the  Army  Alpha  were  given  to  all 
boys.  In  addition,  their  ages  to  the  nearest  tenth  part  of  a  year 
were  determined  and  used  in  the  intercorrelations.  As  a  criterion 
of  proficiency  in  prevocational  courses,  teachers'  school  marks 
were  used.  These  undoubtedly  took  into  account  more  of  the 
factor  of  intelligence  than  would  a  longer  course  in  mechanical 
work,  since  the  instructor  had  each  boy  but  a  few  periods.  This 
preliminary  investigation  showed  that  the  General  Trade  and  the 
Mechanical  Interest  tests  correlate  highly  enough  with  this 
criterion,  compared  to  the  Army  Alpha,  to  justify  the  administra- 
tion to  those  of  the  original  eighth  grade  group  who  had  entered 
high  school,  the  additional  tests  of  Stenquist  Picture  Test  I  and 
II,  and  the  Stenquist  Assembly  Test.  Of  the  original  208  boys  it 
was  possible  to  test  145  who  had  entered  high  school.  The  re- 
tests  were  given  by  Dr.  L.  J.  O'Rourke,  assisted  by  the  Institute 
of  Educational  Research. 

It  is  probably  true  that  between  the  period  of  the  eighth  grade 
and  entering  high  school,  the  elimination  was  selective  in  both 
intelligence  and  mechanical  ability.  A  report  of  the  significant 
relationships  found  in  the  original  investigations  and  in  the  re- 
tests  is  given  on  the  following  pages.1 

Each  student  selected  ordinarily  two  and  generally  three 
courses  from  the  following  list  which  he  pursued  for  eight  weeks, 
six  hours  each  week:  Electrical,  Carpentry,  Sheet-metal,  Foundry, 
Printing,  Forge,  Machine-shop,  Pattern-making. 

The  Preliminary  Investigation  Conducted  by  Toops  and  O'Rourke, 
IQ20-IQ2I .  In  this  investigation  208  cases  were  available.  As 
a  criterion  of  mechanical  ability  each  subject's  school  marks  on 
the  courses  completed  were  averaged.  Corrections  for  differences 
in  the  average  marks  of  different  trade  courses  were  made.  This 
assumes  that  average  students  in  each  class  had  equal  mechanical 
abilities,  an  assumption  probably  only  approximately  true. 


1  Prepared  by  Dr.  Herbert  A.  Toops  from  data  collected  by  himself  and  Dr.  L.  J. 
O'Rourke. 


32  Tests  for  Vocational  Guidance  of  Children 

The  intercorrelations  are  shown  in  Table  IX: 


TABLE  IX 

Intercorrelations  of  Tests  of  208  Prevocational  Boys* 


Mechan- 

Variable 

Crite- 

General 

ical 

Alpha 

Age 

rion 

Trade 

Interest 

Criterion  

•4i 

•33 

.19 

—  .11 

General  Trade .  . . 

.41 

.70 

.42 

.  12 

Mechanical  In- 

terest   

•33 

.70 

•30 

.  10 

Alpha  

•19 

.42 

.30 

-  .26 

Age  

—  .  11 

.12 

.  10 

-  .26 

a  of  Variable .... 

1-7 

15-7 

14.7 

20.5 

1.04 

*  P.E.r  when  N  = 

=  208  cases: 

r      =        ±  .0 

±  .  1      ±  . 

2  ±.3 

=•=•4  ±-S 

.6      ±  .  7 

±  .8 

P.E.>  =       .05  .05 

05  .04 

.04  .04 

.03       .02  .02 

The  two  mechanical  tests  correlate  .70  =*=.02  with  each  other; 
they  correlate  low  with  intelligence:  Army  Alpha  and  General 
Trade,  r  =  42=±=.04;  Mechanical  Interest  and  Army  Alpha,  r  = 
.30  ±  .04.  Age  correlates  positively,  but  low,  with  both  mechani- 
cal tests.  All  facts  seem  to  show  that  mechanical  ability  is  more 
dependent  upon  age  than  is  school  ability  or  intelligence.  In  the 
above  table,  age  correlates  negatively  with  Army  Alpha,  as  is 
usually  the  case  in  a  school  grade  group. 

The  multiple  ratio  regression  equation  for  combining  the  four 
variables  for  predicting  the  criterion  is: 

*1  =  1 .00  *Gm-Tr-  +  .2487  Zm£L  +  .0426         -  .4428^ . 


0"/  OGen.Tr.  &M.I.T.  0" Alpha  °Age 

The  accumulating  test  composite  correlates  with  the  criterion 
to  the  extent: 

Accumulating  Test  Composite  rIC, 

General  Trade  alone  412  db  .04 

Gen.  Tr.+M.IT  4i6±.04 

Gen.  Tr.+  M  IT.  +  Alpha  41 7 ±.04 

Gen.  Tr.  +M.I.T.  +  Alpha-f- Age  446 ±  . 04 
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The  Follow-up  Investigation  of  Boys  Who  Entered  High 
School:  There  were  145  cases  of  the  retest  group  in  the  High 
School  who  had  complete  scores  on  Stenquist  I,  Stenquist  II, 
Stenquist  Assembly,  Age,  M . I  .T. ,  General  Trade.  The  computed 
correlations  were  as  follows: 


7V=i45 


Correlation 
Between 

Sten- 
quist 

Sten- 
quist 
II 

Stenquist 
Assembly 

Age 

M.I.T. 

General 
Trade 

Alpha  

o8=fc .06 

•33  ±-<>5 

.  I4=b  .06 

—  .22±  .05 

.30±.05 

•23±.05 

Intelligence,  measured  one  year,  is  but  little  related  to  mechani- 
cal ability  measured  the  following  year.  The  correlations  on  the 
retest  group  with  other  variables  the  year  previous  are  substan- 
tially the  same  as  in  the  case  of  the  208  subjects. 

Dr.  O'Rourke  then  selected  100  cases  of  the  retest  group  and 
correlated  them  with  the  old  criterion  in  their  actual  scores,  that 
is,  without  grouping  the  gross  scores.  (There  are  thus  14  classes 
instead  of  the  original  8  classes.)    The  correlations  are: 


iV  =  ioo 


Var.  No.  6 

3 

M.I.T. 

4 

Gen.  Tr. 

5 

Alpha 

7 

Sten.  I 

8 

Sten.  II 

9 

Sten. 
Assembly 

Correlation  with 
Criterion  

.02±  .07 

.07±.07 

.20±  .06 

.02±  .07 

.  IOdz  .07 

.oo±  .07 

There  was  no  relationship  in  evidence  in  the  case  of  any  of  the 
variables.  These  results  led  him  to  believe  that  the  criterion 
must  be  unreliable.  Accordingly,  he  computed  the  following 
correlations  of  prevocational  school  marks  with  a  composite  of 
2  X  General  Trade  gross  scores +  Army  Alpha  gross  scores  (weight- 
ing General  Trade  about  i\  times  as  much  as  Army  Alpha  when 
<r's  are  considered). 
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Correlation  With 
Test  Composite 

P.E.r 

T 

N 

Electrical  

.22 

±.09 

55 

•32 

=b  .07 

67 

mippt  mpl  n  1 

1  T 

_|_  Qg 

•13 

i.08 

72 

.04 

±.08 

68 

.09 

±  .09 

60 

Machinist  

-.07 

i.08 

68 

Patternmaking  

•17 

±.IO 

44 

He  accordingly  left  machinist  and  printing  out  of  the  old 
criterion,  calling  it  a  revised  criterion,  and  took  into  account  the 
o-'s  of  the  several  scores,  weighting  each  of  a  subject's  mechanical 
marks  equally.  By  correlating  the  tests  with  the  revised  criterion 
corrected  for  range  (as  and  averages  taken  into  account), 
printing  and  machinist  out,  the  following  r's  result: 


N  =  g8  cases 


Test 

Correlation  With 
Revised  Criterion 

P.E.r 

General  Trade  

•32 

±.06 

Army  Alpha  

•15 

±  .07 

Stenquist  I  

.00 

zb  .07 

Stenquist  II  

.11 

±.07 

.02 

±.07 

M.I.T  

.16 

±.07 

We  might  conclude  from  this  experiment  that  in  this  pre- 
vocational  school  intelligence  is  more  important  in  the  mind  of  the 
instructor  when  grading  his  pupils  than  is  the  type  of  ability 
measured  by  the  Stenquist  Assembly  Test.  This  varies  from 
course  to  course  where  different  instructors  are  involved.  Thus 
we  have  the  customary  high  unreliability  of  teachers'  marks  in 
academic  subjects  made  more  unreliable  in  these  vocational 
courses  by  reason  of  the  instructors  having  only  a  few  contacts 
with  the  pupil  before  the  final  mark  is  given. 

The  significant  conclusion  for  vocational  guidance  would  seem 
to  be  that  such  prevocational  courses,  while  perhaps  of  inesti- 
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mable  worth  in  teaching  the  pupil  a  few  fundamental  facts  about 
the  trade  and  giving  him  a  basis  for  interest  in  the  trade,  are  of  no 
value  for  testing  purposes  if  the  instructor  makes  the  rating  on 
such  casual  relationship  and  in  the  ordinary  manner  in  which 
school  marks  are  made.  It  does  not  necessarily  follow  that 
objective  methods  of  rating  the  proficiency  of  boys  in  such  pre- 
vocational  courses  cannot  be  made.  In  fact,  there  is  every 
reason  to  believe  that  such  objective  rating  can  readily  be  made 
by  an  instructor  who  is  conversant  with  the  principles  of  mental 
testing,  and  that  such  ratings  would  be  of  great  worth  in  guiding 
pupils  into  occupations  in  which  they  have  a  high  chance  of 
success. 

Toops  1  has  shown  that  a  similarly  conducted  bureau  of  voca- 
tional guidance,  which  has  the  pupils  in  its  charge  for  two  weeks, 
is  able  to  make  subjective  ratings  which  are  of  worth  and  which 
would  be  of  very  much  more  worth  provided  one  used  more 
objective  methods  of  scoring  and  better  statistical  methods  of 
evaluating  the  test  results. 

The  re-tests  do  bear  out  the  previous  findings  of  low  correlations 
of  intelligence  with  other  mechanical  measures,  especially  with  the 
manipulative  type  of  ability  required  in  the  Stenquist  Mechanical 
Assembly  Test.  It  also  bears  out  the  inference  that  improvement 
in  one's  mechanical  status  is  more  dependent  upon  age  than  is 
improvement  in  intelligence  or  in  school  work. 

From  the  point  of  view  of  testing  technique,  this  experiment 
emphasizes  the  necessity  of  carefully  planning  in  advance  one's 
complete  testing  program  and  especially  assuring  himself  in 
advance  that  the  criterion  against  which  he  hopes  to  measure  the 
validity  of  his  test,  is  really  reliable  and  measures  what  it  purports 
to  measure.  School  marks,  which  have  little  variability,  will 
probably  be  unreliable  as  a  criterion. 

Relative  Validity  of  Mechanical  Interest  Tests  and  the 
General  Trade  Test  in  Predicting  Teachers'  Estimates 
of  Potential  Ability  in  the  Trade  Courses 

The  automotive,  electrician,  machinist  and  bookkeeping  groups 
of  students  at  the  Camp  Grant  summer  school  of  1920  were  rated 
by  their  instructors  for  potential  ability  in  the  respective  courses. 


1  Toops,  H.  A.  Trade  Tests  in  Education,  pp.  76-95.  Teachers  College  Con- 
tributions to  Education,  No.  115. 
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Since  the  General  Trade  test  is  known,  by  its  correlations,  to 
depend  more  upon  intelligence  than  does  the  Mechanical  Interest 
test,  it  becomes  desirable  to  obtain  some  knowledge  of  the  validity 
of  the  two  in  predicting  mechanical  ability.  The  evidence  which 
is  available  is  shown  in  Table  X  below.  The  bold  face  figure  in 
each  compartment  is  the  correlation  of  the  particular  test  in  the 

TABLE  X 


The  Correlations  of  General  Trade  Test  and  M.I.T.  with  Instructors' 
Estimates  of  Potential  Ability  of  Students  in  Four  Mechanical 
Courses 


Course 
Taken 

Correlation 
of  Potential 
Ability  and 
M.I.T. 

Correlation 
of  Potential 
Ability  and 
204-QuESTioN 
Gen.  Tr.  Test 

Correlation 

of  M.I.T. 
and  General 
Trade  Test 

Correlation 
of  Potential 
Ability  and 
140-Question 
Set  of  the 
Gen.  Tr.  Test 

Automotive. .  .  . 

.05db.O9 

.19±.o8 

.57±.05 

.20±.o8 

61 

65 

87 

65 

Electrician 

.50±.o9 

.51db.o8 

.68±.o6 

.53±.o8 

34 

35 

34 

36 

Machinist  

.43±.io 

.46±.n 

.73±.07 

.47db.ii 

24 

23 

23 

23 

Bookkeeping. .  . 

-   Olzb  12 

.02db.i3 

.70d=.o6 

-.02±.i3 

33 

28 

37 

28 

column  with  the  teachers'  estimates  of  potential  ability  in  the 
course  shown  in  the  row  respectively  to  which  the  correlation 
coefficient  belongs.  Thus  the  Mechanical  Interest  test  correlates 
with  estimates  of  potential  ability  in  the  automotive  course  to  the 
extent  of  .05  ±  .09,  and  the  same  test  correlates  with  estimates  of 
potential  ability  in  the  electrical  course  to  the  extent  of  .50  =±=  .09. 
The  small  figure  in  the  lower  right-hand  corner  in  each  compart- 
ment refers  to  the  number  of  cases  on  which  the  correlation 
coefficient  is  based.  These  numbers  of  cases  vary  from  corre- 
lation coefficient  to  correlation  coefficient  because  of  incomplete- 
ness in  the  records  of  tests  given  at  the  time  of  entrance  to  the 
courses.  These  tests  were  all  given  before  the  students  began  the 
courses  and  consequently  are  to  be  looked  upon  as  being  true 
measures  of  the  efficiency  of  the  respective  tests  in  giving  guid- 
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ance.  The  automotive  courses  were  taught  by  several  instructors 
and  consequently  the  ratings  are  necessarily  attenuated.  The 
General  Trade  test  is  the  only  test  which  promises  to  be  of  value 
in  this  course.  In  the  electrical  and  machinist  courses,  both  the 
Mechanical  Interest  test  and  the  General  Trade  test  correlate  with 
potential  ability  about  .50  and  .45  for  the  two  courses  respectively. 
Neither  test  predicts  bookkeeping  ability.  The  correlations  be- 
tween the  Mechanical  Interest  test  and  the  General  Trade  test,  in 
the  various  courses,  are  about  .70,  which  is  considerably  higher 
than  in  the  case  of  age  groups  of  New  York  City  boys. 

It  will  be  noted  that  these  boys  are  boys  who  are  for  the  most 
part  of  considerable  mechanical  inclination  and  mechanical  ex- 
perience and  have  enough  interest  in  mechanical  things  to  have 
at  least  enrolled  for  mechanical  courses.  The  corresponding 
intercorrelation  of  these  two  tests  in  the  case  of  the  public  school 
R  prevocational  boys  is  likewise  .70;  these  boys  live  in  a  small  city 
with  varied  mechanical  environment,  and  are  very  much  more 
self-reliant  mechanically  than  New  York  boys.  The  intercorre- 
lation of  the  two  tests  in  the  case  of  the  New  York  City  public 
school  boys  who  have  a  limited  mechanical  environment  and  are 
not  taking  any  mechanical  courses,  averages  .50  for  age  groups. 
It  seems  likely  that,  if  ranges  in  ability  were  equated,  the  New 
York  City  boys  would  still  correlate  less  highly  between  these  two 
tests  than  either  of  the  other  two  groups  mentioned  above.  This 
fact  lends  support  to  the  belief  that  practice  in  mechanical 
abilities,  whether  of  the  paper  or  of  the  actual  manipulatory 
variety,  increases  the  correlations  between  the  functions  tested. 

The  fourth  column  of  the  table  shows  the  correlations  of  the 
selected  140-question  set  of  the  General  Trade  test,  thousands  of 
copies  of  which  were  administered  to  soldiers  in  the  Army  E  and 
R  schools  during  the  winter  of  1920  and  192 1,  with  the  esti- 
mates of  potential  ability.  This  revised  set  of  the  General 
Trade  test  has  correlations  which  are  almost  identical  with  the 
longer  (204-question)  set. 

The  Intercorrelations  of  Variables  Bearing  on 
Proficiency  in  the  Machinists'  Course 

At  the  completion  of  the  six  weeks'  course  in  machine  shop 
practice  at  the  Camp  Grant  summer  school  in  1920,  24  machine 
shop  students  rated  each  other  in  regard  to  their  potential 
4 
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(ultimate)  mechanical  ability,  and  were  rated  also  by  their 
teachers.  In  addition,  the  school  marks  given  by  the  instructors, 
and  results  of  a  one-word-answer  trade  test,  or  objective 
measure  of  final  proficiency,  were  available. 

The  students  rated  only  those  students  whom  they  knew  well 
by  picking  out  the  best  man,  whom  they  called  "best,"  the  second 
best  man,  whom  they  called  "good,"  the  worst  man,  whom  they 
called  "poorest,"  and  the  second  poorest  man,  whom  they  called 
' 1  poor. ' '  By  an  arbitrary  procedure,  these  ratings  were  combined 
into  a  final  average  proficiency  rating,  it  being  assumed  that  any 
man  who  was  not  rated  at  all  by  some  one  of  the  24  students  was 
"average."  The  school  marks  had  been  recorded  by  the  in- 
structors on  graphical  charts  in  which  each  of  the  number  of 
operations  into  which  machine  shop  was  divided  had  a  graphical 
record  of  the  amount  of  progress  so  far  attained  in  each  operation 
indicated  by  the  length  of  line  drawn ;  the  summation  in  inches  of 
the  total  length  of  line  drawn  for  each  individual  was  taken  as  the 
school  mark. 

The  teachers'  estimates  of  potential  ability  were  given  on  a 
scale  from  1  to  5  after  the  standard  method  used  in  the  army,  in 
which  the  percentage  of  people  distributed  to  these  five  steps  is 
taken  to  be  such  that  a  normal  distribution  of  ability  will  result. 
The  trade  test  score  of  final  proficiency  was  a  set  of  one-word- 
answer  questions,  bearing  upon  machine  shop  practice  which  the 
students  presumably  had  had  opportunity  to  acquire  during 
their  course. 

In  addition,  the  scores  on  the  Mechanical  Interest  test,  the  204- 
question  General  Trade  test  and  the  revised  140-question  General 
Trade  test  were  available.  The  intercorrelations  of  all  these  tests 
by  the  rank  difference  method  are  shown  in  Table  XI. 

It  is  interesting  to  note  that  although  the  students'  estimates  of 
potential  ability  correlate  only  .55  with  the  teachers'  estimates  of 
potential  ability,  and  although  they  correlate  with  school  marks 
only  to  the  extent  of  .63,  whereas  the  teachers'  estimates  of 
potential  ability  correlate  with  school  marks  to  the  extent  of  .88, 
the  students'  estimates  of  potential  ability  correlate  higher,  .58  =•= 
.09,  with  trade  test  measure  of  final  proficiency  at  the  end  of  the 
six  weeks'  course  than  do  the  teachers'  estimates  of  potential 
ability,  .53  =*=.io..  The  Mechanical  Interest  test  and  the  General 
Trade  test  in  both  forms  correlate  in  the  neighborhood  of  .45 
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TABLE  XI 

Intercorrelations  by  the  Rank  Difference  Method  of  Variables 
in  the  Machinist  Course 

N  =  24 


Students' 
Potential 
Ability 
Estimates 

Teachers' 
Potential 
Ability 
Estimates 

School 
Marks 

Trade 

Test 

Final 

Proficiency 

M.I.T. 

General 
Trade 
204 
Ques. 

Revised 
General 
Trade  140 
Ques. 

Students'  Potential  Abil- 
ity Estimates  

•  55±.io 

.63^.08 
.88±  .03 

•  58±  .09 

.53*. 10 
.44±  .11 

Teachers'  Potential  Abil- 
ity Estimates  

•  SS±.  10 
.63=*=  .08 

.58=*=  .09 

•  43±  11 

.46±  .11 

.47±.ii 

.88±  .03 
.53=*=.  10 

Trade  Test  Final  Pro- 
ficiency   

.44*.  11 

with  teachers'  estimates  of  potential  ability.  It  undoubtedly  is 
true  that  students  are  able  to  learn  facts  of  worth  regarding  their 
fellow-workmen's  trade  capacities  that  are  unobserved  by  the 
teacher.  It  is  likely  that  a  combination  of  the  students'  esti- 
mates with  the  teachers'  estimates  would  be  a  more  reliable 
measure  of  trade  ability  than  either  alone. 


CHAPTER  IV 


TESTS  OF  ABILITY  WITH  THINGS  AND  MECHANISMS: 
GIRLS'  TESTS 

Many  investigations  of  school  progress  involving  females  have 
been  carried  out  from  which  the  general  conclusion  has  been 
reached  that  sex  differences  may  be  neglected  for  the  most  part. 
Indeed,  separate  norms  for  the  sexes  have  seldom  even  been 
compiled,  a  fact  which  shows  the  unimportance  of  sex  distinctions 
in  academic  school  work. 

Practically  nothing  has  hitherto  been  done  in  investigating 
female  mechanical  ability.  By  popular  opinion,  women  are 
credited  with  much  less  mechanical  ingenuity  than  men.  Obser- 
vation of  the  mechanical  environment  of  the  average  woman 
might  readily  lead  one  to  believe  that,  whatever  may  be  the  facts 
in  regard  to  her  innate  mechanical  capacity,  the  average  woman 
must  surely  have  failed  to  develop  it  up  to  a  present  working 
ability  on  a  par  with  mechanical  ability  of  the  average  man. 

It  is  not  essential  that  the  two  sexes  be  tested  on  the  same  tests, 
for  sex  is  a  primary  classification,  of  perfect  reliability  if  deter- 
mined and  recorded  at  the  time  of  the  examination,  and  of  high 
reliability  if  determined  from  the  names  alone.  If  marked  sex 
differences  do  occur,  it  would  be  better  to  use  different  mechanical 
tests  for  the  sexes  because  of  the  higher  validity  which  can  thus  be 
secured;  if  the  sex  differences  in  reality  are  not  marked,  then  a 
tryout  of  tests,  made  with  the  aim  of  differentiating  between  the 
sexes,  would  reveal  the  lack  of  significant  sex  differences,  where- 
upon one  could  discard  the  plan  of  different  tests  for  the  sexes. 
Without,  then,  any  presuppositions  in  regard  to  the  different 
innate  or  acquired  mechanical  differences  of  the  sexes,  it  was 
decided  to  construct  a  girls'  mechanical  assembly  test  which 
should  aim  to  duplicate  for  girls  the  test  situations  afforded  boys 
in  the  Stenquist  Mechanical  Assembly  Test.  Tests  of  girls  in  the 
meanwhile  had  shown  that  girls  do  poorly  on  the  Stenquist 
Assembly  Test.  It  seemed  desirable  to  base  the  tests  on  ability 
to  construct  a  model  which  is  present  at  the  time  of  the  test. 
Twelve  models  were  finally  selected  from  a  much  larger  number 
40 
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originally  considered.  The  selection  of  the  twelve  was  made  on 
the  basis  of  scoring  objectivity,  time  required,  and  whether  the 
test  seemed  from  a  priori  reasoning  to  be  a  component  of  general 
mechanical  ability. 

The  Preliminary  Try-Out  of  the  I.E.R.  Assembly  Test 

After  forty  sets  of  the  twelve  tests  had  been  assembled,  they 
were  given  a  preliminary  try-out  on  38  members  of  a  6A1  class,  or 
"bright"  section  of  the  second  semester  of  a  sixth  grade  class. 
The  examiners  kept  careful  watch  at  the  time  of  beginning  each 
of  the  separate  test  elements,  and  recorded  the  cumulative  time 
as  soon  as  five  pupils  had  begun  a  given  element.  In  this  way,  it 
was  observed  that  threading  three  needles,  the  first  test  of  the  set, 
required  too  long  a  time  limit,  10  minutes  in  the  case  of  many  of 
the  pupils  and  about  6  minutes  for  the  average  pupil  of  the  grade, 
their  average  age  being  10.7  years.  The  other  elements  required 
on  the  average  about  4  minutes  each  before  5  pupils  of  the  38 
began  them;  from  these  observations,  it  was  decided  that  about 
45  minutes  was  a  sufficient  over-all  time  for  eleven  test  elements, 
for  it  was  apparent  that  threading  needles  required  too  much  time 
to  be  a  practical  test. 

All  pupils  were  stopped  at  the  end  of  86  minutes  even  if  they 
had  not  finished;  however,  about  45  per  cent  of  the  class  had 
finished  by  the  end  of  80  minutes. 

The  order  of  difficulty,  from  easiest  to  hardest,  of  the  twelve 
test  elements  is  as  follows: 


Orig- 

Total Credits 

Per  Cent 

inal 

Final 

Test  Element 

Earned  by  38  6A1 

of  Possible 

Letter 

Letter 

Pupils  per  Element 

380  Credits 

C 

B 

Inserting  Tape .... 

320 

84 

B 

out 

Needle  Threading. . 

269 

7i 

A 

A 

Stringing  Beads. .  .  . 

258 

68 

D 

C 

244 

64 

F 

D 

Cross  Stitch  

217 

57 

I 

E 

Key  Ring  

160 

42 

E 

F 

Clip  Chain  

132 

35 

H 

K 

Trimming  Paper.  . . 

101 

27 

K 

G 

Tape  Sewing  

100 

26 

J 

H 

Trunk  Tag  

87 

23 

L 

I 

Card  Wrapping. .  .  . 

57 

15 

G 

J 

Booklet  

48 

13 
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"Threading  needles"  was  eliminated  from  further  consideration 
because  of  the  time  required.  "Trimming  paper"  was  shifted  to 
the  last  position  as  K  of  the  revised  series.  The  positions  of  all 
the  other  tests  were  assigned  to  them  on  the  basis  of  the  above 
percentages,  with  the  exception  that  "stringing  beads"  and 
"inserting  tape"  were  reversed,  in  order  to  place  "stringing 
beads"  as  the  first  test  even  though  it  is  slightly  more  difficult 
than  "inserting  tape."  This  was  done  as  it  was  felt  that  the  box, 
which  is  the  only  variation  from  the  envelope  containers  of  the 
other  models,  could  most  readily  be  explained  if  given  first. 
Since,  also,  every  girl  has  at  some  time  in  her  life  strung  beads,  it 
was  a  task  which  would  appeal  to  most  of  them.  The  principal 
of  the  school  took  the  test  with  the  pupils  and  made  107  out  of  a 
possible  no  points  on  the  final  selection  of  eleven  elements. 

The  credits  earned  on  each  of  the  eleven  elements,  A  to  K,  of 
the  final  series  as  adopted  for  the  final  scale,  by  various  grade 
groups  tested  later  are  given  in  Table  XII.  C,  "rosette,"  is 
somewhat  easier  than  its  position  would  indicate,  and  its  position 
might  be  shifted  from  third  place  to  second  by  exchanging  places 
wi th  ' '  tape  inserting. ' '  Similarly, ' '  paper  trimming ' '  is  somewhat 
easier  than  its  position  would  indicate,  and  might  be  placed  just 
before  J,  the  booklet.  It  was  subsequently  placed  as  K  for  the 
reason  that  there  was  difficulty  in  the  subjects  knowing  what  was 
required,  a  difficulty  which  it  is  believed  will  be  partially  obviated 
by  the  present  revised  printed  form.  The  differences  in  difficulty 
are  so  slight  that,  for  the  present,  it  is  scarcely  worth  the  while  to 
make  these  changes.  The  order  of  difficulty  of  the  test  elements 
is  substantially  the  order  in  which  they  are  placed  in  the  scale. 

Results  from  Tests  of  Boys  on  the  I.E.R.  Girls' 
Assembly  Test 

The  I.E.R.  Girls'  Assembly  Test  was  administered  to  30  boys  in 
the  7A  grade  of  public  school  B.  As  shown  by  Table  XIII,  these 
boys  have  slightly  higher  averages  in  the  Stenquist  Assembly, 
Arithmetic- Reading  intelligence  combination,  combined  Stenquist 
Picture  tests  than  of  13-year-old  boys  in  general.  As  a  group 
they  are  thus  more  than  equal  to  the  average  13-year-old  boy  in 
all  these  tests.  When  compared  with  the  norm  of  the  13-year-old 
girls  on  the  I.E.R.  Girls'  Assembly  Test,  the  average  of  the  boys' 
scores,  38.2,  is  about  a  35-percentile  13-year-old  girls'  performance. 
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TABLE  XIII 

The  Averages  and  Standard  Deviations  of  the  Four  Tests  Given  to 
Thirty  7A  Boys,  and  to  All  13-Year-Old  Boys  of  Public  School  B 


I.E.R. 

Assembly 

Sten. 
Assembly 

Arith.- 
Re. 

Combined 

Sten. 
Picture 

Averages 

N=30  Our  Group  of  Boys 

38.2 

39-6 

70.7 

28.4 

All  13-Year-Old  Boys 

384 

64.8 

26.8 

(r's 

N=$o  Our  Group  of  Boys 

16.7 

21 .0 

13-4 

6-7 

All  13-Year-Old  Boys  

19. 1 

18.6 

9.6 

Our  small  test  group  has  slightly  higher  averages  in  Stenquist 
Assembly,  Intelligence  and  Combined  Stenquist  Picture  tests, 
with  more  variability  in  Stenquist  Assembly  but  less  variability 
in  Intelligence  and  Combined  Stenquist  Picture  tests  than  the 
13-year-old  boys  in  general. 

The  correlations  of  the  I.E.R.  Assembly  Test  with  the  three 
above  mentioned  tests  are  shown  in  Table  XIV. 

TABLE  XIV 

Correlations  of  the  I.E.R.  Girls'  Assembly  Test  with  Three  Others 
in  the  Case  of  Thirty  7A  Boys  in  Public  School  B 

Correlation 
with  I.E.R. 

Arithmetic-Reading  Intelligence  12  ±  .  12 

Combined  Stenquist  Picture   .  50  =b .  09 

Stenquist  Assembly  53  ±  .  09 

This  table  shows  that  the  I.E.R.  Assembly  Test  does  not 
depend  very  much  upon  intelligence  in  the  case  of  boys,  and  that 
it  does  depend  upon  the  abilities  measured  by  other  mechanical 
tests  about  as  highly  as  these  other  tests  depend  upon  each 
other. 
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Mechanical  Tests  in  the  Case  of  an  Ungraded  Class  of 
Girls  with  Conclusions  in  Reference  to  the  Improv- 
ability  Through  Practice  on  the  Stenquist  Assembly 
Test  and  the  I.E.R.  Girls'  Assembly  Test 

The  two  mechanical  assembly  tests  were  administered  to  15 
pupils  above  the  age  of  11  in  the  girls'  school,  public  school  G, 
who  had  done  such  poor  academic  work  that  they  had  been  placed 
in  the  ungraded  class.  This  class  consists  only  of  those  whom  the 
school  authorities  consider  to  be  so  mentally  defective  as  to  be 
incapable  of  worthwhile  progress  in  the  normal  classes.  In  the 
ungraded  class  they  are  given  individual  attention  by  the  teacher, 
with  a  large  amount  of  emphasis  upon  handwork  and  training  in 
household  duties  rather  than  upon  the  formal  academic  type  of 
instruction.  The  average  Stenquist  Assembly  Test  score  for 
these  15  ungraded  cases,  average  age  of  14.3,  was  26  points;  the 
average  score  of  all  pupils  in  the  school,  180  cases,  13  years  of  age 
or  over  found  in  grades  3B  to  6B  inclusive  was  only  18  points. 
In  other  words,  although  the  members  of  the  ungraded  class  are 
considered  mentally  defective  by  the  school  authorities,  they  made 
8  points  higher  average  score  on  the  Stenquist  Assembly  Test 
than  the  general  run  of  pupils  in  the  school  whose  average  age  is 
in  excess  of  13  years.  This  difference  might  be  thought  of  as 
possibly  due  to  the  difference  in  age  of  the  two  groups;  but  when 
we  examine  the  norms  of  the  13-  and  14-year  old  girls,  we  find 
that  the  norm  of  the  13-year-old  girls  is  about  19  while  that  of  the 
14-year-olds  is  23.  These  norms  include  the  brighter  13-  and 
14-year-olds  from  public  school  J,  which  makes  this  an  adequate 
sampling  of  those  ages.  Our  ungraded  group  is  therefore  at  least 
3  points  superior  in  the  Stenquist  Assembly  Test  to  the  median  of 
all  the  14-year-old  girls. 

The  average  I.E.R.  Girls'  Assembly  Test  score  of  these  un- 
graded pupils  was  31.  Suprisingly,  this  is  13  points  lower  than 
the  norm  for  either  the  13-year-old  or  the  14-year-old  girls,  both 
ages  having  the  same  norm.  The  I.E.R.  Assembly  Test  is, 
however,  known  to  depend  more  upon  intelligence  than  does  the 
Stenquist  Assembly  Test  as  determined  by  the  correlations  re- 
spectively with  intelligence.  We  would  naturally  expect  an 
unusual  amount  of  mechanical  practice  of  girls  to  be  most  ef- 
fective in  producing  differential  average  mechanical  test  scores  of 
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the  girls  when  that  practice  is  directed  into  such  mechanical 
channels  as  the  majority  of  girls  do  not  enter,  namely,  the  boys' 
mechanical  tests. 

The  correlation  between  the  Stenquist  Assembly  Test  and  the 
I.E.R.  Assembly  Test  in  the  case  of  these  girls  is  .80.  This 
correlation  is  twice  the  magnitude  of  the  average  intercorrelation 
of  age  groups  for  these  tests,  which  indicates  a  relationship  that  is 
several  times  as  great  when  measured  in  terms  of  the  reduction  of 
the  standard  error  of  estimate.  Part  of  this  relationship  is  due 
to  the  larger  range  of  ability  included ;  and  without  being  able  to 
estimate  the  effect  of  difference  in  range  of  ability,  we  are  unable 
to  predict  to  what  extent  practice  in  mechanical  things  has  im- 
proved the  correlation  between  the  two  tests.  It  seems  rather 
impossible  for  difference  in  range  of  ability  to  account  for  all  the 
differences  in  correlation,  since  this  correlation  coefficient  of  .80 
is  considerably  higher  than  the  reliability  coefficient  of  the  Sten- 
quist Assembly  Test,  (.60)  for  seventh  and  eighth  grade  groups  of 
public  school  boys.  The  evidence  thus  points  to  a  marked  in- 
crease in  relationship  between  the  two  tests  due  to  the  practice 
in  mechanical  things  to  which  this  class  had  been  subjected. 

The  I.E.R.  Assembly  Test  as  a  Measure  of  a  Distinct 
Ability  or  Group  of  Abilities 

The  I.E.R.  Assembly  Test  was  administered  together  with 
other  tests  to  318  girls  of  ages  12  to  15.  Its  correlations  with 
these  other  tests  and  with  certain  facts  from  the  school  records  are 
given  in  Table  V  on  page  22  of  Chapter  III.  These  correlations 
demonstrate  that  the  test  measures  an  ability  less  closely  allied  to 
ability  with  ideas  and  to  success  in  school  work  than  to  the  ability 
measured  by  the  low-level  clerical  tests  and  the  Stenquist  Assem- 
bly Test.  It  seems  to  do  for  the  girls  what  the  Stenquist  Assem- 
bly Test  does  for  boys,  but  not  so  clearly  and  emphatically. 

The  Determination  of  the  Mechanical  Interest  of  Girls 

If  it  be  assumed,  as  seems  reasonable,  that  one  cannot  possess 
an  interest  in  anything,  however  elementary  its  nature,  until  he 
knows  something  about  it,  we  have  a  basis  for  constructing  inter- 
est tests.  This  principle,  when  applied  to  mechanical  things, 
assumes  that  an  individual  who  is  interested  in  mechanical  things 
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will  have  normally  and  without  undue  effort  absorbed  and  re- 
tained, to  the  point  of  psychological  recall,  bits  of  amateurish 
information  about  mechanical  things. 

With  this  point  of  view  in  mind,  a  list  of  200  questions,  designed 
to  test  amateur  knowledge  of  many  mechanical  situations,  found 
in  women's  mechanical  environment,  has  been  collected  from  a 
number  of  sources.  Each  question  can  be  answered  with  a  single 
word.  The  recall  form  of  question  eliminates  the  guessing 
element  of  a  true-false  or  multiple  choice  form  of  test.  The  recall 
form,  measured  by  the  number  of  questions  contained,  invari- 
ably has  a  high  reliability. 

This  test  was  not  used  after  it  was  decided  to  delimit  the  scope 
of  this  inquiry  by  not  investigating  the  relationship  of  interest  and 
ability.  It  is  given  here  for  the  benefit  of  anyone  who  may  care 
to  experiment  with  this  type  of  test. 

Girls'  General  Trade  Test 

1.  What  tool  do  you  use  to  tighten  up  a  roller 

skate  which  is  loose  on  your  shoe?   

2.  What  does  a  shoemaker  put  on  his  thread 

before  sewing  a  shoe  with  it?   

3.  What  part  of  a  wall  clock  regulates  the  time?   

4.  With  what  material  are  the  hammers  of  a 

piano  covered?   

5.  How  many  sheets  of  paper  does  a  printer 

call  a  ream?   

6.  What  substance  is  often  used  in  rain  water 

filters  to  absorb  the  foul  gases  in  the  water?   

7.  What  do  you  call  the  part  of  a  fishing  rod  on 

which  the  fishing  line  rolls  up?   

8.  What  would  you  put  on  a  cork  to  keep  it 

from  absorbing  water?   

9.  What  wheel  in  a  watch  regulates  the  time?   

10.  How  many  valves  are  there  in  a  hand  lift  or 

suction  pump?   

11.  What  liquid  is  used  in  an  artificial  ice-mak- 
ing plant  to  freeze  the  water  by  its  evapo- 
ration?   

12.  What  wood  is  best  for  making  clothes  chests?   

13.  Of  what  material  are  photographic  films 

made?   
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14.  What  do  you  call  the  narrow  strips  of  wood 
which  are  nailed  all  over  the  inside  of  a 
house  to  hold  the  plaster  on? 

15.  What  tool  do  you  use  to  sink  the  head  of  a 
finishing  nail  below  the  surface  of  the  wood 
on  varnished  work? 

16.  What  do  you  call  the  timbers  to  which  the 
sheeting  for  a  roof  is  fastened? 

17.  What  do  you  call  the  mudlike  substance 
used  to  hold  window  glass  in  place? 

18.  What  is  mixed  with  water  to  make  white- 
wash? 

19.  What  tool  does  a  bricklayer  use  to  spread 
his  mortar? 

20.  When  cement  plaster  is  put  on  the  outside  of 
a  house  for  a  finish  coat,  what  is  it  called? 

21.  What,  besides  sand,  gravel  and  water,  is 
used  in  making  concrete? 

22.  What  do  you  call  the  crook  in  the  waste  pipe 
under  a  wash  sink  to  prevent  sewer  gas  from 
getting  into  the  house? 

23.  What  do  you  call  the  part  of  a  faucet  which 
you  replace  in  order  to  stop  it  from  dripping? 

24.  In  what  part  of  a  steam-heating  system  is 
the  steam  made? 

25.  What  is  often  provided  near  the  top  of  a 
steam  radiator  to  let  the  air  out? 

26.  What  part  of  a  furnace  do  you  open  to  make 
the  fire  burn  hotter? 

27.  If  the  base  of  a  right  triangle  is  3  feet  and  the 
height  is  4  feet,  what  is  the  length  in  feet  of 
the  third  or  longest  side? 

28.  In  a  quarter-sized  drawing,  how  many  inches 
on  the  drawing  stand  for  one  foot  on  the 
object? 

29.  What  do  you  call  a  saw  which  you  use  to  cut 
iron  rods? 

30.  What  do  you  call  the  pin  sometimes  used  to 
keep  a  nut  from  coming  off  a  bolt? 

31.  What  do  you  call  a  caliper  which  will 
measure  to  the  thousandth  part  of  an  inch? 

32.  What  do  you  call  the  heavy  iron  tool  on 
which  a  blacksmith  holds  his  horseshoe  to 
pound  it? 

33.  What  tool  does  a  blacksmith  use  to  pick  out 
a  hot  horseshoe  from  the  fire? 

34.  How  is  a  broken  iron  rod  mended? 
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35.  What  do  you  call  the  twelve-pound  hammers 

used  by  blacksmiths?   

36.  What  kind  of  water  is  used  to  fill  a  storage 
battery? 

37.  What  do  you  call  the  connection  when  dry 
batteries  are  connected  each  with  its  carbon 
wired  to  the  zinc  of  the  next  nearest  one?   

38.  What  is  the  voltage  of  an  ordinary  house 
electric  lighting  system?   

39.  Of  what  material  are  the  white  tubes  made 
which  are  used  to  insulate  electric  light  wires 
where  they  go  through  a  thin  wooden  wall  or 
partition?   

40.  What  acid  is  used  in  the  solution  of  a  storage 
battery?   

41.  What  safety  device  is  used  in  house  wiring  to 
protect  the  electrical  circuit  from  too  heavy 

a  current?   

42.  What  do  you  call  the  iron  pipe  through 
which  electric  wires  are  run  in  house  wiring?   

43.  What  do  you  divide  the  number  of  volts  by 
in  order  to  get  the  number  of  amperes 
which  are  flowing  through  an  electrical 
circuit?   

44.  What  instrument  reduces  the  power  line 
electrical  voltage  low  enough  for  household 

use?   

45.  What  instrument  is  used  to  test  a  storage 
battery  solution?   

46.  What  must  be  done  to  automobile  cylinders 

after  they  become  badly  worn?   

47.  What  part  of  an  automobile  engine  con- 
nects and  disconnects  the  flywheel  and  the 
engine  shaft?   

48.  What  instrument  is  used  on  the  dashboard 
to  show  the  speed  at  which  an  automobile 

is  traveling?   

49.  What  part  of  an  automobile  deadens  the 

noise  of  the  exhaust?   

50.  In  what  part  of  an  engine  do  the  pistons 

work  back  and  forth?   

51.  What  substance  is  sifted  on  an  automobile 
inner  tube  to  keep  it  from  sticking  to  the 
casing?   

52.  What  would  you  do  to  prevent  an  auto- 
mobile from  kicking  back  when  cranking  it?   

53.  What  kind  of  bearings  are  used  in  a  bicycle 

to  prevent  friction?   
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54.  What  is  the  diameter  in  inches  of  most  bi- 
cycle wheels? 

55.  What  do  you  call  the  toothed  wheels  over 
which  a  bicycle  chain  runs? 

56.  How  many  cycles  do  most  motorcycles 
have? 

57.  What  do  you  call  the  large  wheel  on  the  side 
of  an  engine  to  make  it  run  steadily? 

58.  What  tool  is  used  to  sharpen  a  garden  hoe 
while  at  work  in  the  garden? 

59.  What  is  used  to  hold  an  axe  head  tightly  on 
the  handle? 

60.  Of  what  kind  of  stone  are  grindstones  made? 

61.  Of  what  wood  are  the  best  axe  handles 
made? 

62.  After  mixing  the  baking  powder  and  salt  in 
the  flour  for  biscuits  what  do  you  do  to  the 
flour  mixture  before  putting  in  the  eggs  and 
shortening? 

63.  What  liquid,  combined  with  salt,  makes  a 
good  homemade  polish  to  remove  the  tarnish 
from  brass? 

64.  What  often  forms  on  brass  so  that  it  will  not 
stay  polished? 

65.  What  finish  is  often  given  brass  beds  in  the 
factory  to  keep  them  bright  and  polished? 

66.  What  tool  is  used  to  make  the  holes  in  eyelet 
embroidery? 

67.  How  many  needles  would  one  need  to  buy  in 
order  to  knit  a  woolen  stocking  by  hand? 

68.  What  do  you  call  the  blunt-point  tool  which 
is  used  for  drawing  baby  ribbon  into  in- 
sertion? 

69.  How  many  stitches  are  taken  between  each 
locking  of  threads  in  hemstitching? 

70.  What  letter  on  a  standard  typewriter  key- 
board is  immediately  to  the  right  of  the 
letter  s? 

71.  What  do  you  call  the  lever  which  you  press 
on  a  typewriter  to  get  capital  letters? 

72.  What  do  you  call  the  large  rubber  roll  which 
feeds  up  the  paper  on  a  typewriter? 

73.  What  part  of  a  telephone  switchboard  do  the 
plugs  fit  into? 

74.  What  do  you  call  the  thin  pieces  of  sheetiron 
behind  the  mouthpiece  in  a  telephone  trans- 
mitter? 
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75.  Of  what  material  are  telephone  transmitter 
mouthpieces  made? 

76.  What  stiff  material  would  you  use  for  mak- 
ing a  hat  foundation? 

77.  When  hemstitching  is  cut  in  two  lengthwise, 
what  do  you  call  the  points  which  remain  on 
the  sides  of  the  material? 

78.  What  is  the  name  of  the  material  which  you 
use  to  set  sleeves  in  a  garment? 

79.  What  do  you  cut  in  the  seam  of  a  garment  to 
keep  the  material  from  unraveling? 

80.  On  what  part  of  the  phonograph  does  a 
phonograph  record  rest  while  being  played? 

81.  From  what  part  of  the  material  would  you 
always  cut  binding? 

82.  Of  what  material  are  watch  springs  made? 

83.  What  cooked  vegetable  may  be  used  as  a 
substitute  for  glue  or  paste? 

84.  From  what  material  are  the  strongest 
crochet  hooks  made? 

85.  What  do  you  call  the  laundry  machine  which 
is  used  for  smoothing  large  flat  work  like 
sheets  and  towels? 

86.  What  chemical  is  sometimes  put  in  corn  to 
make  it  keep  when  canned  at  home? 

87.  What  do  you  put  in  sour  milk  to  sweeten  it? 

88.  What  do  you  call  the  machine  on  which 
homemade  carpets  are  woven? 

89.  What  do  you  call  the  thread  used  lengthwise 
of  strip  carpet  to  hold  the  filling  together? 

90.  To  what  part  of  a  horse's  bridle  are  the  lines 
or  reins  fastened? 

91.  What  one  numeral  is  not  shown  on  any  keys 
of  the  adding  machine? 

92.  What  do  you  use  to  propel  a  canoe  in  the 
water? 

93.  What  is  used  on  the  stern  of  a  sailboat  to 
steer  the  boat? 

94.  How  many  strings  has  a  violin? 

95.  What  acid  is  used  in  most  eyewashes  for 
babies? 

96.  What  is  the  best  substitute  for  a  part  of  the 
eggs  in  a  recipe  calling  for  many  eggs? 

97.  What  do  you  call  the  long  stitches  used  to 
hold  two  pieces  together  while  sewing  them 
on  the  machine? 
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98.  What  do  you  call  the  spool  on  which  the 
lower  thread  of  a  sewing  machine  is  wound? 

99.  What  part  of  the  head  of  a  sewing  machine 
keeps  the  material  from  slipping? 

100.  When  using  thread  of  any  size  between  No. 
40  and  No.  70,  what  size  sewing  machine 
needle  would  you  use? 

101.  What  do  you  call  the  part  of  a  typewriter 
which  makes  the  ribbon  go  up  and  down? 

102.  What  do  you  call  the  part  of  an  electric  light 
fixture  into  which  the  bulb  screws? 

103.  What  tool  does  a  paper  hanger  use  to  press 
down  the  seams? 

104.  What  do  you  call  the  beading  which  a  paper 
hanger  sometimes  puts  around  the  room  at 
the  height  of  about  3J  feet? 

105.  What  tool  does  a  shoemaker  use  to  make 
the  holes  for  the  shoe  tacks? 

106.  What  tool  does  a  butcher  use  to  cut  a  bone 
or  gristle  which  is  a  little  too  small  to  be 
sawed  ? 

107.  What  tool  does  a  butcher  use  to  put  a  keen 
edge  on  his  knife  without  regrinding  it? 

108.  Wvhat  gas  is  used  for  fumigating  after  a  con- 
tagious sickness  or  to  kill  roaches  and 
crickets? 

109.  What  chemical  in  bleaching  powder  gives  it 
its  bleaching  power? 

1 10.  What  liquid  would  you  use  to  take  shellac  or 
varnish  off  a  window? 

111.  What  liquid  would  you  use  to  cut  the  rust 
off  a  piece  of  steel? 

112.  What  do  you  drop  into  homemade  wood 
ashes  lye  to  test  its  strength? 

113.  What  do  you  drop  into  coffee  to  make  it 
clear? 

114.  What  do  you  add  to  grease  to  make  soap? 

115.  When  lard  is  put  in  cakes  or  bread,  what  do 
you  call  it?  . 

116.  What  tool  would  you  need  to  uncouple  a 
hose  from  a  faucet  or  hydrant  if  it  had  stuck 
tightly? 

117.  What  do  you  call  the  joint  in  the  corner  of  a 
picture  frame? 

118.  What  calibre  are  most  target  and  small 
game  hunting  rifles? 
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119.  What  tool  does  a  glazier  use  to  drive  in 

glaziers  points?   

120.  Of  what  metal  are  glaziers  points  made?   

121.  What  do  you  dissolve  in  flour  paste  to  make 

sizing  for  paper  hanging?   

122.  What  tool  is  needed  to  hang  a  screen  door?   

123.  What  is  the  main  ingredient  of  all  omelettes?   

124.  In  what  do  you  cook  oatmeal  to  keep  it  from 

scorching?   

125.  What  besides  egg,  flavoring  and  seasoning 

is  used  in  making  custard?   

126.  What  is  the  main  ingredient  of  turkey 

dressing?   

127.  What  flavoring  is  used  for  eggnogg?   

128.  What  are  stuck  into  the  fat  of  the  ham  in 

making  baked  ham?   

129.  In  what  kind  of  pickles  is  the  brine  allowed 

to  ferment?   

130.  What  kind  of  cake  takes  many  eggs?   

131.  In  making  cornbread  with  sweet  milk,  what 

would  you  put  in  to  make  it  rise?   

132.  What  ingredient,  usually  used  in  cakes,  is 

left  out  of  sponge  cake?   

133-  What  kind  of  stone  is  used  for  heat  in  a  fire- 
less  cooker?   

134.  Through  what  kitchen  utensil  do  you  rub 

the  apple  pulp  in  making  apple  butter?   

135.  What  do  you  use  to  clarify  grease  that  has 

been  used?   

136.  What  is  the  watery  part  of  sour  milk  called?   

137.  Of  what  material  are  the  mats  made  which 
are  used  under  pots  or  pans  to  prevent 

scorching?   

138.  What  part  of  a  single  harness  is  fastened  to 

the  singletree  or  whififletree?   

139.  What  do  you  call  the  part  of  an  ice  skate 

which  touches  the  ice  while  skating?   

140.  What  small  attachment  on  a  camera  shows 

the  picture  which  is  being  taken?   

141 .  What  do  you  call  the  way  in  which  ribbon  is 

cut  so  that  it  will  not  ravel?   
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142.  What  ingredient  is  browned  in  making 
brown  gravy? 

143.  What  do  you  call  the  sauce  which  is  made  of 
flour,  water,  butter  and  seasoning  only? 

144.  At  10  cents  a  kilowatt  hour,  how  much  will 
it  cost  to  use  a  500-watt  electric  iron  for  4 
hours? 

145.  What  do  you  do  to  the  first  coat  of  varnish 
before  giving  it  the  second  coat  if  you  want  a 
fine  polish? 

146.  What  do  you  use  on  white  pine  or  spruce  to 
to  give  it  a  mahogany  finish? 

147.  What  are  used  under  the  legs  of  a  dresser  so 
that  it  can  be  moved  about  easily? 

148.  What  oil  is  used  in  mixing  gilt  paint? 

149.  What  do  you  put  on  baked  sweet  potatoes  to 
make  them  brown? 

150.  What  besides  salt,  water  and  color  is  needed 
to  make  salt  beads? 

151.  What  do  you  put  in  the  hot  water  in  which 
you  are  washing  rubber  rings? 

152.  How  many  X's  stand  for  confectioner's 
sugar? 

153.  What  do  you  call  the  mixture  of  confection- 
er's sugar  and  water  used  in  candy? 

154.  What  is  the  main  ingredient  of  marsh- 
mallow? 

155.  Wrhat  do  you  call  a  narrow  piece  of  material 
of  contrasting  color  used  for  decorative  ef- 
fect as  in  the  vertical  seams  of  a  skirt? 

156.  What  do  you  call  the  pump  which  is  used  to 
clear  out  clogged  plumbing  fixtures? 

157.  What  kind  of  stitch  is  used  on  a  salt  sack  so 
that  it  will  easily  ravel? 

158.  What  color,  other  than  green,  could  you  dip 
a  blue  dress  into  in  order  to  dye  it  green? 

159.  What  liquid  do  you  use  to  set  the  color  when 
dyeing  a  dress  red? 

160.  Wrhat  is  put  on  the  back  of  a  piece  of  glass  to 
make  a  mirror  out  of  it? 

161.  In  making  a  hat  frame,  what  do  you  call  the 
wire  which  you  use  to  fasten  together  two 
wires  where  they  cross? 

162.  In  a  hat  frame,  what  do  you  call  the  wires 
which  run  from  the  crown  to  the  edge  of  the 
brim? 


Ability  With  Things  and  Mechanisms:  Girls  Tests 

163.  What  does  a  dressmaker  use  to  take  her 
measurements?   

164.  What  do  you  call  the  very  long  stitches  used 
by  a  dressmaker  to  hold  the  parts  together 
while  fitting  the  dress?   

165.  What  do  you  call  the  stitch  used  to  prevent 
raveling  on  the  edges  of  a  seam?   

166.  Of  what  color  should  bedsteads  in  a  sick 

room  be?   

167.  What  is  the  process  of  disinfecting  a  room  by 

gas  called?   

168.  What  is  added  to  milk  to  keep  it  from  cur- 
dling when  making  creamed  tomato  soup?   

169.  What  kind  of  bones  are  used  to  flavor  bean 
soup?   

170.  What  vegetables  are  used  in  succotash?   

171.  What  kind  of  meat  is  used  in  Irish  stew?   

172.  From  what  part  of  the  beef  is  porterhouse 
steak  cut?   

173.  What  meat  is  used  to  season  baked  beans?   

174.  What  herb  is  sometimes  used  to  season  sau- 
sage?   

175.  What  do  you  do  to  cotton  batting  before 
putting  it  in  a  new  sofa  pillow  to  keep  it  from 
matting?   

176.  What  do  you  do  to  a  fruit  jar  top  to  make  it 
unscrew  easily?   

177.  What  are  the  two  principal  ingredients  used 

in  French  dressing?   

178.  What  do  you  call  the  process  of  setting  the 
colors  after  hand  painting  on  china?   

179.  What  liquid  is  applied  to  charcoal  drawings 

to  fix  them?   

180.  What  is  used  to  drive  the  tools  in  embossing 
leather?   

181.  Of  what  material  is  the  point  of  a  pyro- 
graphic  needle  made?  .... 

182.  Of  what  material  are  the  victrola  needles 
made  that  give  the  softest  tones?   

183.  Of  what  material  are  the  best  bathtubs 
made?  .... 

184.  What  should  you  do  to  an  oil  lamp  wick  be- 
fore extinguishing  the  light?  .... 

185.  What  happens  to  iron  saucepans  when  they 

are  put  away  damp?  .... 
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1 86.  When  clothes  are  left  in  a  warm,  wet  con- 
dition for  several  hours,  what  do  you  call  the 

injurious  result?   

187.  What  common  household  liquid  will  remove 

paint  easily  from  window  glass?   

188.  What  common  article  of  food  is  used  to 
"set"  pink,  green  or  black  colors  when  dye- 
ing?   

189.  When  sleeves  are  larger  than  the  armhole 
where  should  the  greatest  amount  of  fullness 

come?   

190.  What  do  you  call  celluloid  glasses  worn  to 

protect  the  eyes  against  sun-glare  and  dust  ?   

191.  What  do  you  put  under  a  hot  dish  of  food  to 

keep  the  heat  from  injuring  the  table?   

192.  After  giving  a  baby  its  bath,  what  should 

you  put  on  it  to  prevent  chafing?   

193.  What  is  the  name  of  the  small  vegetable 
which  looks  like  an  onion  and  is  used  to 

season  food?   

194.  What  vessel  besides  a  coffee  pot  is  most 

commonly  used  to  make  coffee?   

195.  What  do  you  call  the  piece  of  material  which 

you  sew  on  a  worn  place  in  a  garment?   

196.  What  do  you  call  the  slimy  mass  which  forms 

in  vinegar  when  it  stands  for  a  long  time?   

197.  In  making  bread  what  do  you  put  in  it  to 

make  it  rise?   

198.  What  fruit  is  frequently  cooked  in  with 

bread  or  rice  pudding?   

199.  Of  what  material  is  the  best  heavy  thread 
made  which  is  used  for  sewing  on  overcoat 

buttons?   

200.  What  tool  is  quickest  to  use  in  making 

whipped  cream?   

201.  What  cloth  is  made  from  the  hair  of  angora 

goats?   

202.  What  do  you  sometimes  put  on  thread  to 

make  it  stronger  for  sewing  on  buttons?   

203.  What  do  you  do  to  cream  in  order  to  make  it 

whip?   

204.  What  do  you  call  the  boiler  which  will  keep 

rice  from  scorching?   

A  True-False  Test  of  Girls'  Mechanical  Interest 
One  of  the  values  of  the  true-false  test  is  that  bits  of  information 
which  cannot  be  used  to  advantage  in  other  forms  of  tests  can 
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readily  be  adapted  to  this  form  of  test.  One  can  also  readily 
raise  the  question  as  to  which  of  two  practices  is  the  better,  when 
there  are  reasons  why  one  should  be  preferred  to  the  other. 

The  following  list,  which  is  but  a  slight  beginning,  shows  how 
readily  this  form  of  interest  test  can  be  constructed.  Practically 
all  of  the  questions  in  the  Girls'  Recall  General  Trade  Test  can 
readily  be  adapted  to  this  form  of  test.  Like  the  former,  this  test 
was  not  used  after  it  was  decided  not  to  include  a  study  of  me- 
chanical interest  in  this  investigation. 
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1.  No.  50  sewing  thread  is  a  very  coarse  sewing  thread.  T  F 

2.  Only  three  needles  are  used  in  knitting  the  heel  of  a 
woolen  stocking.  T  F 

3.  Rose  color  is  a  kind  of  purple.  T  F 

4.  Lavender  is  a  shade  of  yellow.  T  F 

5.  A  bodkin  is  a  very  small  darning  needle.  T  F 

6.  Basting  thread  is  stronger  than  sewing  thread.  T  F 

7.  The  lengthwise  threads  of  cloth  are  stronger  than  the 
crosswise  threads.  T  F 

8.  The  eye  of  a  sewing  machine  needle  is  in  the  top.  T  F 

9.  The  flat  part  of  a  sewing  machine  needle  is  at  the  end 
which  contains  the  eye.  T  F 

10.  A  white  dress  may  be  dyed  pale  pink.  T  F 

11.  A  dark  blue  dress  may  be  dyed  pale  green.  T  F 

12.  A  typist  is  a  person  who  sets  type  in  a  printing  office.  T  F 

13.  The  soles  of  shoes  are  sometimes  made  of  paper.  T  F 

14.  An  automobile  may  be  brought  to  a  dead  stop  within 
five  feet  when  the  machine  is  traveling  at  fifty  miles 

an  hour.  T  F 

15.  Gingham  is  used  principally  for  making  tablecloths.  T  F 

16.  Damask  is  much  used  in  evening  gowns.  T  F 

17.  Wisteria  blooms  at  the  same  time  as  clematis.  T  F 

18.  The  Hoover  is  a  brand  of  breakfast  food.  T  F 

19.  Postum  is  a  tooth  paste.  T  F 

20.  Crisco  is  a  scouring  cleaner.  T  F 

21.  A  knitting  needle  ends  in  a  hook.  T  F 

22.  New  potatoes  are  scraped  instead  of  peeled.  T  F 

23.  New  potatoes  make  good  mashed  potatoes.  T  F 

24.  Fruit  cake  is  best  if  made  only  one  day  before  eating 

it.  T  F 

25.  Stale  bread  is  better  than  fresh  bread  in  turkey  dress- 
ing. T  F 

26.  The  U.  S.  Government  allows  the  use  of  benzoate  of 

soda  to  preserve  catsup  but  not  milk.  T  F 
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27.  Beet  sugar  is  cheaper  than  cane  sugar.  T  F 

28.  Beet  sugar  is  sweeter  than  cane  sugar.  T  F 

29.  "Soft  white"  sugar  is  always  granulated.  T  F 

30.  Ammonia  in  dishwater  will  cut  the  grease.  T  F 

31.  Mustard  is  a  good  emetic.  T  F 

32.  Goldenrod  blooms  in  the  spring.  T  F 

33.  Zinnias  are  one  of  the  earliest  blooming  spring  plants.  T  F 

34.  Bloodroot  is  a  woods  flower,  blooming  in  the  fall.  T  F 

35.  Radishes  are  ready  for  eating  five  weeks  after  plant- 
ing. T  F 

36.  Round  steak  is  the  most  expensive  cut  of  steak.  T  F 

37.  Bacon  is  cut  from  the  sides  of  the  hog.  T  F 

38.  Neck  cuts  are  used  mostly  for  frying.  T  F 

39.  Tripe  is  made  from  sweetbreads.  T  F 

40.  If  ashes  are  allowed  to  accumulate  in  a  grate  they 

will  cut  off  the  draught.  T  F 

41.  A  small  pipe  in  a  kitchen  sink  is  less  likely  to  choke 

with  grease  than  a  large  one.  T  F 

42.  Grease  is  easily  removed  from  drain  pipes  by  the  use 

of  potash  and  hot  water.  T  F 

43.  Fresh  grapes  may  be  kept  for  several  weeks  by  pack- 
ing them  in  sawdust.  T  F 

44.  Enameled  iron  is  sometimes  used  for  making  bath- 
tubs. T  F 

45.  A  needle  shower  is  always  a  cold  shower  bath.  T  F 

46.  Waste  pipes  should  never  be  made  from  cast  iron.  T  F 

47.  Brass  pipes  are  not  corroded  by  ordinary  water.  T  F 

48.  Water  pipes  usually  burst  when  the  water  in  them 
freezes.  T  F 

49.  A  figured  carpet  wears  longer  than  a  plain  carpet  of 

the  same  quality.  T  F 

50.  If  a  person's  ears  are  frozen  rubbing  them  with  snow 

will  take  out  the  frost.  T  F 

51.  A  solution  of  water  and  baking  soda  is  good  for  scalds 

and  wasp  stings.  T  F 

52.  Confectioner's  sugar  always  is  marked  with  five  X's 

in  a  row.  T  F 

53.  Goods  bought  in  bulk  are  usually  cheaper  than  those 
bought  in  the  package.  T  F 

54.  The  best  artificial  light  may  be  secured  from  candles.  T  F 

55.  Candles  may  be  purchased  by  the  pound  or  in  packages.T  F 

56.  The  charred  parts  of  a  burned  lamp  wick  should  be 
pinched  off.  T  F 

57.  If  an  oil  lamp  is  filled  entirely  full  there  is  danger  of 
explosion.  T  F 

58.  The  wick  to  an  oil  stove  should  be  turned  up  as  far  as 
possible  before  lighting  it.  T  F 
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59.  A  shopping  list  with  approximate  prices  and  quanti- 
ties is  a  great  hindrance  to  efficient  shopping.  T  F 

60.  It  is  more  expensive  to  buy  perishable  goods  in  large 
quantities.  T  F 

61.  Food  which  is  "in  season"  is  always  more  econom- 
ical than  food  which  is  "out  of  season."  T  F 

62.  Thoughtful  consideration  of  tradespeople  is  not  a 

good  policy.  T  F 

63.  It  is  inadvisable  to  have  stated  periods  for  settling 
household  accounts.  T  F 

64.  The  quantity  of  food  needed  for  a  given  number  of 
people  may  usually  be  found  from  recipes  in  a  cook 
book.  T  F 

65.  Milk  is  sometimes  the  means  of  carrying  and  spread- 
ing disease.  T  F 

66.  Turpentine  will  remove  paint  spots  from  a  glass  win- 
dow. T  F 

67.  If  water  splashes  against  the  baseboard  in  scrubbing 

it  may  turn  the  varnish  white.  T  F 

68.  Lamb  is  less  nutritious  than  mutton.  T  F 

69.  Fresh  oysters  are  good  to  eat  the  year  around.  T  F 

70.  Fresh  pork  should  be  purchased  in  the  warm  months 

of  the  year.  T  F 

71.  Liver,  kidney  and  tripe  should  be  used  immediately 
after  purchasing.  T  F 

72.  Young  fowls  may  usually  be  purchased  for  less  money 

per  pound  than  old  fowls.  T  F 

73.  Fresh  eggs  will  float  if  put  into  water.  T  F 

74.  The  most  desirable  potatoes  are  those  having  many 

deep  eyes.  T  F 

75.  Very  small  potatoes  lose  a  great  deal  of  weight  in  the 
peeling.  T  F 

76.  If  silver  spoons  are  left  in  fried  eggs  for  several  hours 

they  will  tarnish.  T  F 

77.  Fruit  should  always  be  stored  in  a  dark,  cool  place.  T  F 

78.  A  good  paste  may  be  made  from  flour  and  water.  T  F 

79.  When  packed  in  salt,  eggs  will  keep  fresh  for  a  long 
period  of  time.  T  F 

80.  When  storing  soap,  care  should  be  taken  to  stack  the 

bars  so  air  may  pass  between  them.  T  F 

81.  Hard  and  dry  candles  are  less  liable  to  burn  away 
quickly  than  soft  candles.  T  F 

82.  Plates  should  always  be  removed  from  the  right-hand 

side  of  the  diner.  T  F 

83.  Sand  is  a  good  scour  for  saucepans.  T  F 

84.  Vinegar  and  hot  water  will  remove  the  smell  of  onions 

from  saucepans.  T  F 
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85.  Sugar  and  water  boiled  together  will  make  an  excel- 
lent syrup  for  griddle  cakes.  T  F 

86.  Nickel  saucepans  are  excellent  for  cooking  sweets.        T  F 

87.  The  use  of  soda  to  clean  aluminum  will  turn  the  metal 
black.  T  F 

88.  Pine  tar  bags  are  good  moth  preventives.  T  F 

89.  Ammonia  is  good  for  cleaning  linoleum.  T  F 

90.  It  is  not  a  good  plan  to  remove  pictures  from  the  walls 
before  cleaning  the  walls.  T  F 

91.  Backs  of  brushes  are  sometimes  made  from  tortoise 
shell.  T  F 

92.  It  is  a  good  practice  to  dry  a  hair  brush  near  an  open 

fire.  T  F 

93.  Combs  are  sometimes  made  from  gutta-percha.  T  F 

94.  It  is  not  a  good  plan  to  keep  household  brushes  hang- 
ing up  when  not  in  use.  T  F 

95.  A  whisk  broom  is  a  long  handled  household  broom.       T  F 

96.  Dust  should  never  be  emptied  from  a  carpet  sweeper 
immediately  after  using  it.  T  F 

97.  A  vacuum  cleaner  is  sometimes  used  in  place  of  a 
broom.  T  F 

98.  A  whisk  broom  and  a  carpet  broom  are  intended  to 
clean  the  same  articles.  T  F 

99.  It  is  economical  to  purchase  cheap  brushes  because 

they  will  not  need  to  be  replaced  frequently.  T  F 

100.  Sponges  are  manufactured  from  the  pulp  of  certain 
trees.  T  F 

101.  Coarse  sponges  are  less  expensive  than  soft  sponges.    T  F 

102.  Sponges  should  never  be  rinsed  after  using  them  with 
soap.  T  F 

103.  Chamois  is  the  softest  household  leather  obtainable.    T  F 

104.  Chamois  should  always  be  washed  in  cold  water.         T  F 

105.  A  warm,  dry  room  is  the  most  preferable  place  for 
keeping  household  linens.  T  F 

106.  Linen  damask  is  a  figured  fabric  sometimes  used  for 
curtains.  T  F 

107.  Sheets  are  sometimes  made  from  unbleached  cotton 
material.  T  F 

108.  Linen  huckaback  is  a  good  material  for  bedroom 
towels.  T  F 

109.  Hand  towels  made  from  cotton  are  more  serviceable 

than  those  made  from  linen.  T  F 

no.  Face  towels  and  bath  towels  are  usually  about  the 
same  size. 

in.  Mending  the  laundry  before  putting  it  away  is  not 

so  satisfactory  as  mending  it  only  as  it  is  used.  T  F 

112.  Selvage  threads  will  break  easily  when  pulled.  T  F 
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113.  Weft  threads  pass  across  the  warp  from  side  to  side  of 

the  material.  T  F 

114.  A  mangle  is  a  machine  used  to  iron  flat  work.  T  F 

115.  Freshly  ironed  clothes  are  sometimes  put  on  a  clothes 
horse.  T  F 

116.  Borax  and  water  should  be  used  to  remove  tea  stains.    T  F 

117.  Clothes  should  never  be  hung  with  the  thick  part 
uppermost.  T  F 

118.  Silk  should  always  be  wrung  thoroughly  with  the 
hands  when  washing  it.  T  F 

1 19.  When  a  person  has  burned  himself  badly,  olive  oil  and 
borax  solution  is  a  good  remedy.  T  F 

120.  Iodine  applied  too  frequently  to  the  body  will  blister 

the  skin.  T  F 

121.  Receipts  for  money  should  always  be  destroyed  at 
once.  T  F 

122.  A  pass  book  may  be  purchased  from  a  bank  at  a  small 

cost.  T  F 

123.  In  sewing,  a  pattern  should  never  be  used  when  the 
garment  is  cut  on  the  fold  of  the  material.  T  F 

124.  A  garment  may  sometimes  be  cut  larger  than  the  pat- 
tern by  making  a  fold  in  the  material  before  cutting.    T  F 

125.  Garments  cannot  be  cut  very  accurately  when  the 
pattern  is  pinned  to  the  material.  T  F 

126.  ITeck  and  wrist  bands  and  belts  should  always  run  on 

the  selvage  of  the  material.  T  F 

127.  Basting  is  a  permanent,  durable  stitch.  T  F 

128.  When  the  neck  of  a  garment  is  too  large,  it  may  some- 
times be  made  to  fit  by  making  small  tucks.  T  F 

129.  When  making  a  French  seam,  you  always  make  the 

first  seam  on  the  wrong  side  of  the  material.  T  F 

130.  Thread  size  100  is  the  coarsest  thread  made.  T  F 

131.  Thread  size  16  is  a  good  size  for  working  buttonholes 

in  baby  clothes.  T  F 

132.  When  sewing  lace  onto  an  edge  always  hold  the  lace 

next  to  you.  T  F 

133-  Veal  has  higher  food  value  than  beef.  T  F 

134.  Peanuts  grow  on  the  roots  of  peanut  plants.  T  F 

135.  Parsley  is  a  vegetable  which  grows  on  a  vine.  T  F 

136.  Orangeade  is  a  sort  of  jelly  made  from  the  rinds  of 
oranges.  T  F 

137.  Olive  oil  is  sometimes  used  as  a  sort  of  medicine.  T  F 

138.  The  largest  pods  of  okra  are  the  most  tender.  T  F 

139.  Oats  is  a  well-known  cereal.  T  F 

140.  A  napkin  is  a  sort  of  towel  used  at  the  table.  T  F 

141.  Some  soaps  contain  naphtha.  T  F 

142.  Fresh  fruits  are  digested  more  quickly  than  meats.  T  F 
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143.  To  make  table  linen  very  smooth,  iron  it  when  per- 
fectly dry.  T  F 

144.  Pieces  of  much  worn  linen  make  excellent  bandages.    T  F 

145.  Lamp  chimneys  and  globes  should  always  be  washed 

in  clear,  cold  water.  T  F 

146.  Woolen  cloths  are  better  than  cotton  cloths  for  polish- 
ing. T  F 

147.  Varnished  furniture  should  always  be  cleaned  with  a 
damp  cloth.  T  F 

148.  Before  burning  sulphur  in  a  room,  all  metal  articles 
should  be  removed  from  the  room.  T  F 

149.  A  piano  keeps  its  tone  better  in  a  steam-heated  apart- 
ment than  in  one  with  hot-air  heat.  T  F 

150.  To  keep  cut  flowers  fresh  keep  them  in  a  very  warm 
room.  T  F 

151.  Partly  ripe  tomatoes  will  ripen  quickly  if  put  in  a 
sunny  window.  T  F 

152.  Partly  ripe  bananas  will  ripen  best  if  wrapped  in 
paper  and  put  in  the  dark  for  a  day  or  so.  T  F 

153.  Beading  is  a  strip  of  cloth  trimmed  with  beads.  T  F 

154.  Alfalfa  is  a  woolen  material  used  for  suits  and  dresses.    T  F 

155.  A  barrette  is  a  removable  bar  in  a  gate.  T  F 

156.  A  brassiere  is  a  kind  of  cooking  kettle.  T  F 

157.  Sulphur  is  added  to  rubber  to  preserve  the  elasticity 

of  the  rubber.  T  F 

158.  Milk  bottles  are  best  cleaned  by  washing  in  clear,  hot 
water  only.  T  F 

159.  A  cleaver  is  used  to  fasten  together  two  pieces  of 
board.  T  F 

160.  Cluny  is  a  kind  of  lace.  T  F 

161.  A  coffer  is  a  pot  for  making  coffee.  T  F 

162.  Canned  currants  will  keep  if  mashed  and  added  to  the 
same  weight  of  sugar  without  cooking.  T  F 

163.  A  fowl  should  be  cooked  immediately  after  killing  it.    T  F 

164.  A  Welsh  rarebit  is  a  steak  only  slightly  cooked.  T  F 


CHAPTER  V 


TESTS  OF  ABILITY  WITH  CLERICAL  ITEMS  AND 
PROCEDURES1 

Our  work  on  clerical  tests  began  with  the  evaluation  of  a  series 
which  had  been  given  in  1920  by  Dr.  L.  W.  Sackett  to  soldiers  in 
the  Camp  Grant  schools. 

From  these  data  obtained  for  bookkeepers  in  the  army  school  we 
were  able  to  use  the  partial  correlation  method  in  the  weighting  of 
nine  variables  to  form  a  provisional  bookkeeping  placement  ex- 
amination. This  was  never  used,  since  work  was  immediately 
begun  on  a  thorough  revision  of  a  set  of  tests  later  called  the  Unit 
Tests,  which  could  be  given  in  a  shorter  time  limit,  and  could  be 
quickly  and  more  objectively  scored. 

The  criterion  of  ability  to  progress  consisted  of  four  variables : 

I.  Teacher's  estimates  of  "potential  ability."  The  teacher 
rated  the  pupils  in  letters,  arbitrarily  later  given  numerical  scores 
as  follows: 

A  =  10;  B  =  8;  C+=6;  C  =  5;  C-  =  4;  D  =  2 ;  E  =  o. 

II.  A  combination  was  made  of  the  arithmetical  sum  (thus 
weighting  each  directly  according  to  its  a)  of  teachers'  estimates  of 
"morale  or  interest,  intelligence,  mathematics,  and  language,"  the 
ratings  being  each  given  on  a  scale  of  1  to  5.  The  arithmetical 
sums  were  arbitrarily  given  credit  on  a  scale  of  o  to  10  as  follows: 

Sums  of  0-4  567  8-9  10-11  12-13  14-15  16  17  18-20 

Credit  given .  .    01234       5        6        7       89  10 

III.  School  marks  on  a  scale  of  o  to  10. 

IV.  Measures  of  absolute  school  progress. 

These  were  lengths  of  line  in  a  grade  book,  the  lengths  of  line 
indicating  the  relative  progress  in  different  "fundamental  ele- 
ments or  operations"  of  bookkeeping  theory.  With  equal 
practice,  those  who  made  the  largest  amount  of  absolute  progress 
were  farthest  along  toward  final  or  graduating  proficiency.  These 
lengths  of  line  were  mechanically  summated  by  means  of  a  pair  of 
dividers,  and  scores  were  given,  on  a  basis  of  o  to  10,  for  total 
lengths  of  line  in  units  as  follows: 

22-34  =  0;  35-64  =  2;  65-168=5;  169-187  =  8;  188-214=10. 


1  The  reader  who  is  interested  only  in  the  results  finally  attained  may  omit  all  of 
this  chapter  save  the  pages  which  report  the  methods  of  selection  of  tests  for  the 
I.E.R.  General  Clerical  Test  (pp.  73  to  84),  and  the  results  with  New  York  City- 
school  children  (pp.  96  to  99). 
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It  will  be  noted  that  all  of  the  above  variables  have  been 
changed  into  ranks  on  a  basis  of  o  to  10,  assuming  a  normal  distri- 
bution. The  as  of  the  four  variables  are  thus  practically  equal. 
The  scores  were  then  added.  Thus,  with  as  equal,  extreme 
weight  was  given  to  school  marks,  measures  of  absolute  progress, 
and  estimates  of  potential  ability  as  against  the  weight  given  to 
interest,  intelligence,  mathematics  and  language.  The  summated 
scores  were  all  divided  by  3  to  yield  smaller  numbers  for  ease  in 
calculating  correlations.  These  scores  become  variable  1,  the 
criterion,  in  the  following  discussion: 

The  variables  evaluated,  by  variable  numbers,  are: 

1.  Criterion  of  bookkeeping. 

2,  3,  4.  Tests  not  evaluated. 

5.  A  Trabue  Completion  test  of  the  usual  form  made  from 
selected  Trabue  sentences.  This  was  test  5  of  the  Sackett  Series. 
It  was  later  revised,  use  being  made  of  a  new  principle  in  the 
reactions  of  subjects  whereby  the  initial  letter  of  the  word  to  be 
completed  is  given,  and  becomes  test  4  of  the  Unit  Tests. 

6.  Substitution  of  code  letters  for  numbers  as  in  store  price 
codes.  This,  when  revised,  becomes  test  9  of  the  Unit  Tests. 
This  was  test  6  of  the  Sackett  Series. 

7.  Filing.  Words  are  marked  with  the  number  of  the  letter 
group  under  which  they  would  be  filed  in  the  letter  scale.  This, 
when  revised,  becomes  test  10  of  the  Unit  Tests.  This  was  test  7 
of  the  Sackett  Series. 

8.  Copy  Checking.  A  test  to  detect  and  correct  errors  in  copy- 
ing totals  of  arithmetical  additions  to  a  vertical  column,  and  in 
checking  the  correct  transfers.  This  becomes  test  7  of  the  Unit 
Tests.    This  was  test  8  of  the  Sackett  Series. 

9.  10.  Tests  not  evaluated. 

1 1 .  Number  Copying.  A  test  of  copying  numbers  of  increasing 
number  of  digits  into  blank  spaces  on  the  back  side  of  the  test 
sheet.  This  is  test  11  in  both  the  Sackett  Series  and  the  Unit 
Tests. 

12.  Test  not  evaluated. 

13.  Army  Alpha.    Form  6. 

14.  Army  Arithmetic  Test.  A  20-minute  test  in  the  four 
fundamentals,  modeled  after  the  Woody  Arithmetic  Tests. 

15.  Army  Reading  Test.    Time,  14  minutes. 

16.  Mechanical  Interest  Test.  A  test  aiming  to  measure 
familiarity  with  the  use  of  common  mechanical  tools.  This  test 
was  given  as  a  routine  test  for  placement  of  soldiers  in  mechanical 
courses.    No  time  limit;  about  a  45-minute  examination. 

There  were  27  cases  in  the  bookkeeping  group.    All  scores  in  all 
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the  variables  were  used  as  available,  i.e.,  as  ratings  comparable  to 
the  Army  Alpha  letter  ratings  arbitrarily  given  scores  as  follows: 

A=io;  B  =  8;  C+=6;  C  =  5;  C-  =4;  D  =  2;  E  =  o. 
The  intercorrelations,  together  with  as  and  averages,  are  shown 
in  Table  XV. 

TABLE  XV 

The  Intercorrelations,  Standard  Deviations  and  Averages  of  27  Book- 
keeping Students  in  the  Several  Variables  of  the  Text  * 


(The  numbers  in  parentheses  are  the  numbers  of  the  variables 
of  the  Unit  Test  Series.) 


Criterion 

Trabue 

Completion  (4)  | 

Substitution 
(9) 

Filing  (10) 

Copy 

Checking  (7) 

Number 
Copying  (ii) 

Army 
Alpha 

Army 

Arithmetic 

Army 
Reading 

Mechanical 
Interest 

1 

5 

6 

7 

8 

11 

13 

14 

15 

16 

Criterion  i  

39 

•  Si 

•  53 

.52 

•54 

.40 

•  36 

.36 

.20 

Trabue  Comple- 

tion (4)  s  

•  39 
■  5i 

.17 

.40 
.26 

•  38 
.29 

.  22 

.20 

.23 
•  5i 

.27 
•41 

.13 
.28 

Substitution  (9)  6 . 

•  17 

•  5i 

•  38 

•  S3 

.40 

.26 

•  55 

•41 

.26 

•  47 

.22 

.02 

Copy  Checking  (7) 

•  52 

.38 

29 

•  55 

•  57 

•  15 

•24 

.21 

•  33 

Number  Copying 

(11) 11  

•54 
.40 

.22 

.51 
•  38 

•4i 
.26 

•  57 

•  15 

•  29 

•  49 

•  55 

•  39 
.83 

.22 

Army  Alpha  13. . . 

.20 

•  29 

39 

Army  Arithmetic 

14  

.36 
.36 

.23 
.27 

.51 
.41 

■47 

■24 
.21 

•  49 

•  39 

•  55 
.83 

.40 

.18 

Army  Reading  15. . 

.40 

.29 

Mechanical  fnter- 

est  16  

.20 

.13 

.28 

.02 

•  33 

.22 

•  39 

.18 

.29 

2.782 

1 .852 

1 .078 

1.633 

2 .622 

I  .246 

1.805 

2.077 

1 .690 

2.043 



7-74 

6.44 

5.15 

5.18 

4-70 

5-33 

7.82 

7-78 

7.74 

6.78 

*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  XV  may  be  found  from  the 
following  table: 


P.E.r  When  r  Is: 


N 

0 

± .  1 

=t .  2 

*  -3 

±■4 

*  -5 

±  .6 

*  -7 

±.8 

=*=  -9 

27 

•  13 

•  13 

.  12 

.  12 

.  11 

.  10 

.08 

.07 

.05 

.02 
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The  regression  equation  resulting  is  as  follows: 

Unit  No. :   4  9  10  7  11 

Trabue        Substi-  Copy  Number  Army 

Completion      tution         Filing         Checking       Copying  Alpha 

X\  Xra         Xq  X7  .        X%         X\i  ,  X\% 

—  =  .179—  +.312— +.352—  +.331—  -f  .389— +.143— 

(1 1  0*5  0"6  07  0"8  CTH  <T  13 

Army  Army  Mechanical 

Arithmetic       Reading  Interest 

+  .045^  +  .064^  _  ,025^ .    r'C  =  -72  -  .06. 

<7l4  <TlK  <7"16 

It  is  interesting  to  note  that  not  only  do  the  placement  tests  for 
academic  subjects,  Alpha  (13),  Arithmetic  (14),  and  Reading  (15) 
correlate  lower  with  the  criterion  than  do  the  clerical  tests, 
variables  5,  6,  7,  8  and  1 1,  but  also  that  the  importance  of  each  of 
these  academic  tests  is  less  than  the  least  valuable  of  the  clerical 
tests  in  the  regression  equation. 

The  magnitude  of  the  multiple  correlation  coefficient,  rIC*  —  .72 
±  .06,  is  such  as  to  indicate  much  promise  from  a  composite  series 
of  such  variables  in  placing  people  in  bookkeeping  courses. 

Since  it  is  apparent  that  the  Army  Alpha,  Arithmetic  and 
Reading  tests  add  but  little  value  to  the  efficiency  of  the  place- 
ment relative  to  the  amount  of  time  taken,  it  seems  desirable  to 
find  the  multiple  correlation  coefficient  for  the  clerical  tests  alone, 
since  these  can  be  given  in  a  fraction  of  the  time  required  for  all 
the  test  variables.  The  five  clerical  tests  alone,  weighted  by  the 
same  regression  weights,  yield  77c  =  -70±07-  This  would  be 
slightly  improved  upon  were  one  to  calculate  the  new  regression 
weights,  involving  the  five  variables  only. 

Thus  we  might  take  a  short  examination  composed  of  Unit  Tests 
Nos.  4,  9,  10,  7  and  11  and  with  this  group  secure  a  correlation  of 
.70  between  the  criterion  of  bookkeeping  ability  and  the  composite 
score  weighted  according  to  the  weights  of  the  above  regression 
equation,  provided  the  revision,  not  only  in  the  alteration  of  the 
form  of  the  tests  but  of  the  time  limit,  does  not  operate  to  destroy 
the  above  existing  interrelationships  of  the  tests  as  given  in  the 
table  of  intercorrelations. 

The  results  of  this  experiment  were  so  encouraging  that  a 
thoroughgoing  revision  of  the  tests  was  undertaken,  the  subjects 
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being  the  pupils  of  a  large  business  college.  These  students  are  of 
a  more  selected  range  of  ability,  have  carefully  kept  records,  and 
are  quite  typical  of  the  type  of  persons  who  enter  industry  through 
the  route  of  a  business  college  training  in  typing,  stenography,  or 
bookkeeping.  It  is  quite  evident  that  with  a  group  of  more 
limited  range  of  ability,  lower  correlations  will  be  found  for  equally 
meritorious  tests;  but,  if  these  are  computed  on  a  more  reliable 
criterion,  they  are  preferable  to  the  larger  correlations  obtained 
on  the  soldier  group.  Again,  the  five  tests  here  combined  may  not 
be  at  all  meritorious  in  predicting  ability  in  stenography  and 
typing.  The  correlations  of  school  marks  in  stenography  with  the 
regression  prediction  of  bookkeeping  fitness  total  scores  on  both 
the  entire  set  of  nine  tests  and  the  shorter  set  of  five  tests  for 
thirteen  students  in  the  army  school  of  stenography  were  com- 
puted with  the  results  of  Table  XVI. 


TABLE  XVI 

Correlations  of  Weighted  Test  Scores  with  Marks  in  Bookkeeping 
and  in  Stenography,  Camp  Grant  Soldiers 


Group 

No. 

rIC,  All 

rIC,  with  Only 

of  Cases 

Nine  Tests 

Five  Tests 

Criterion  of  Bookkeepers  

27 

. /2zb  .06 

.7o=fc.o7 

Stenographers'  School  Marks  

13 

.04±  .19 

.oi±  .19 

The  five  tests  as  above  weighted  yield  a  correlation  of  .01 
between  school  marks  and  the  test  composite  in  the  case  of  the  13 
army  students  in  stenography;  and  the  nine  tests  a  correlation  of 
.04,  thus  proving  these  weightings  to  have  no  value  for  predicting 
these  stenographic  marks.  Different  tests  or  different  weights  of 
the  present  tests  are  needed;  or,  probably  the  reliability  and  the 
validity  of  the  stenographic  school  marks  are  at  fault. 

For  practical  work,  the  procedure  should  be  simplified  to  save 
the  necessity  of  making  so  many  transformations  of  data  before 
obtaining  the  final  fitness  scores. 

The  Unit  Tests 

It  became  evident  that  for  army  use,  and  for  vocational  guid- 
ance use  in  general  the  tests  must  be  of  such  a  nature  as  to  be 
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given  by  teachers  or  other  persons  not  specifically  trained  in 
psychological  test  techniques.  Consequently,  it  was  desirable 
to  reduce  to  the  minimum  the  amount  of  necessary  preliminary 
t  raining  on  the  part  of  the  examiner.  This  necessitated  first  of  all 
that  the  directions  should  be  read  by  the  subjects  rather  than  by 
the  examiner,  if  uniform  results  were  to  be  secured  from  the  work 
of  different  examiners.  It  was  decided  that  the  directions  should 
be  printed  as  a  part  of  the  test  and  included  in  the  test  time,  to  be 
read  silently  by  the  test  subject,  and  work  on  the  test  to  be  begun 
just  as  soon  as  he  had  completed  the  directions.  This  means  that 
each  test  has  in  it  the  element  of  ability  to  understand  printed 
directions.  Whether  or  not  this  is  an  advantage  or  disadvantage 
remains  to  be  proved.  At  least  one  may  say  that  all  of  the  con- 
ventional tests  involve  the  requirement  of  ability  to  understand 
verbal  directions.  The  desirability  of  having  the  directions  a  part 
of  the  test,  rather  than  the  usual  verbal  directions  read  by  the 
examiner,  is  a  question  to  be  settled,  not  on  the  basis  of  a  priori 
reasoning,  but  rather  on  the  basis  of  the  correlations  to  be  obtained 
with  an  adequate  criterion.  The  necessary  experiment  has  not 
been  performed.  Lack  of  ability  to  understand  the  directions  of 
a  test  will  result  in  too  many  zero  scores.  In  industry  we  have 
noted  not  a  single  case  of  a  person  working  in  clerical  work  who 
had  not  at  least  a  sixth  grade  education ;  consequently,  the  reading 
ability  of  clerical  workers  in  general  is  undoubtedly  superior  to 
sixth  grade  reading  ability.  This  makes  the  argument  for  the 
necessity  of  verbal  directions,  given  by  the  examiner,  rather 
ineffective. 

It  would  also  be  very  desirable  if  the  test  could  be  given  with  an 
over-all  time  limit,  that  is,  by  the  work-limit  method.  In  a 
vocational  guidance  bureau,  for  instance,  the  applicants  are  likely 
to  come  straggling  in  one  at  a  time  and  it  would  be  quite  laborious 
for  the  examiner  to  keep  records  of  the  time  on  each  test.  An 
empirical  formula  for  changing  time-limit  tests  over  to  a  work- 
limit  method,  by  consideration  of  the  partial  correlation  weights 
and  the  variabilities  of  the  different  tests,  has  been  considered. 
If  such  a  technique  should  prove  successful,  one  would  be  able  so 
to  vary  the  number  of  test  elements  in  each  of  the  tests  that  the 
tests  would  be  properly  weighted  with  respect  to  each  other  in  the 
total  scale  by  means  of  the  relative  number  of  elements  in  the 
several  tests.    Neither  sufficient  time  nor  adequate  data  have 
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been  available  for  carrying  this  investigation  to  its  conclusion. 
Consequently,  time  limits  for  the  several  tests  have  been  estab- 
lished by  choosing  such  a  time  limit  that  over  99  per  cent  of 
exceptionally  good  business  college  students  are  unable  to  finish 
in  the  time  limit  assigned. 

A  ranking,  by  a  number  of  test  workers  and  psychologists,  of 
tests  in  their  order  of  predictive  value  in  predicting  stenographic 
ability  had  demonstrated  the  fact  that  there  is  little  agreement 
among  such  test  workers  in  regard  to  the  relative  validities  to  be 
expected  from  the  different  tests.  Nearly  all  expressed  them- 
selves as  unwilling  to  make  a  prediction  of  the  correlation  that 
might  be  obtained  between  the  various  tests  and  an  adequate 
criterion  of  stenographic  ability.  It  seemed  desirable,  therefore, 
to  construct  a  large  number  of  tests  varying  from  very  routine  and 
non-verbal  on  the  one  hand  to  very  abstract  and  academic  on  the 
other,  and  then  to  administer  these  to  a  group  of  business  college 
students  upon  whom  adequate  criteria  of  ability  to  progress  in 
acquiring  these  subjects  could  be  obtained.  From  this  conviction 
it  was  but  a  step  to  the  development  of  the  Unit  Test  idea,  already 
foreshadowed  in  part  by  the  work  of  Link. 

As  originally  constructed,  the  Unit  Test  series  consisted  of  32 
tests,  each  of  which  had  the  directions  included  as  a  part  of  the 
test.  The  series  ranged  in  test  content  from  very  routine  to  very 
abstract  material.  Each  test  was  given  a  number  rather  than  a 
name. 

The  original  Unit  Test  number  is  always  to  be  printed  or 
mimeographed  at  the  left  of  the  scoring  box  on  any  Unit  Test,  and 
will  thus  enable  adequate  comparisons  of  records  to  be  made  of 
test  scores  when  the  tests  are  given  in  different  scale  combinations. 
Thus,  this  plan  purported  to  be  the  beginning  of  a  test  plan  which 
might  extend  over  a  period  of  years.  Once  the  tests  which  were 
to  make  up  the  Unit  Test  series  were  determined  upon,  adequate 
time  limits  determined,  clear  directions  decided  upon,  the  tests 
might  be  made  up  in  mimeographed  form  and  kept  available  in 
large  numbers  on  the  shelves  of  the  laboratory.  Since  a  general 
set  of  directions  (and  qualification  questionnaire  sheet)  would  also 
apply,  it  would  be  possible  upon  an  hour's  notice  to  assemble  into 
a  test  booklet  any  selection  of  15  to  30  tests,  or  more,  that  might 
in  the  judgment  of  the  experimenter  be  expected  to  yield  positive 
correlations  with  the  particular  test  criterion  for  which  one  might 
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be  called  upon  to  construct  a  potential  ability  scale.  Dr.  Sackett 
and  the  writer  had  the  assistance  for  final  review  in  this  work  of 
Drs.  Otis,  Holley,  Rice,  O'Rourke  and  Teachout.  The  series 
was  subsequently  enlarged  by  the  addition  of  mechanical  tests, 
reading  tests  and  others,  to  contain  42  tests. 

Numerous  additional  requirements  were  decided  upon,  most  of 
these  being  considerations  of  the  mechanical  make-up  of  the 
printed  or  mimeographed  page  for  ease  and  speed  in  scoring  and 
to  secure  compliance  with  the  directions  on  the  part  of  the  test 
subjects.  For  instance,  in  those  tests  where  the  subjects  are 
inclined  to  work  across  the  page  rather  than  in  columns  according 
to  directions,  it  has  been  found  that  a  heavy  vertical  line  dividing 
the  page  into  columns  will  tend  to  cause  the  subject  to  work  in 
columns  rather  than  in  rows.  Innovations  or  improvements  upon 
past  test  technique  used  in  these  tests  are  given  below : 


1.  Scoring  boxes 


A  W  R. 


are  provided  in  the  lower 


right  hand  corner  of  each  mimeographed  page,  allowing  a 
ready  tabulation  of  attempts,  wrongs  and  rights  on  each  test. 
To  the  left  of  this  box  is  placed  the  original  Unit  Test  number 
of  the  test.  This  is  always  the  same  for  a  given  test  in  what- 
ever combination  or  order  it  may  be  used,  allowing  a  quick 
comparison  of  results  on  the  one  administration  with  others  at 
previous  or  subsequent  times.  (The  Unit  Tests  may  be 
numbered  serially  at  the  top  of  the  pages,  according  to  the 
order  in  which  they  appear  in  a  given  scale.) 
Where  the  subject  is  to  work  in  columns,  rather  than  in  rows 
across  the  page,  heavy  vertical  lines  automatically  guide  the 
subject  to  work  in  the  columns  as  versus  the  rows. 
The  administration  time  of  each  test  is  printed  in  the  same 
position  near  the  top  of  the  page  in  each  test,  so  that  the  test 
subject  may  secure  a  relative  idea  of  the  speed  required  on  the 
test. 

In  all  tests  which  require  such,  samples  are  always  given 
immediately  beneath  the  directions  and  administration  time. 
It  is  the  aim  to  have  three  samples  on  each  test,  the  first  two 
being  easy  samples  and  the  last  more  difficult,  in  case  there 
is  a  varying  difficulty  in  the  items.  This  most  difficult  sample 
should  be  approximately  as  difficult  as  the  most  difficult  test 
reaction  which  the  subject  will  have  to  make.  The  answers 
to  the  sample  questions  which  indicate  the  reaction  to  be  per- 
formed by  the  test  subject  are  always  written  in  script  in  order 
to  indicate  to  the  subject  the  proper  place  for  his  answer. 
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The  samples  always  appear  relatively  at  the  same  place  on  the 
page  and  are  set  off  from  the  directions  above  and  the  test 
beneath  by  heavy  horizontal  lines;  the  test  subject  thus  is 
never  in  doubt  as  to  where  to  look  for  the  sample  reactions. 

6.  Practice  pages,  with  a  short  time  limit,  are  given  for  such  tests 
as  require  involved  explanation  of  the  directions. 

7.  At  the  foot  of  each  page  appears  in  italics  the  direction: 
Wait  for  the  signal  before  turning  the  page.  Where  there  are 
two  pages  to  a  test  this  signal  has  been  changed  to  Turn  over 
to  the  next  page.  More  on  the  next  page.  It  has  been  the 
attempt  throughout  to  anticipate  the  points  at  which  the  test 
subject  might  wish  to  have  a  question  answered  and  to  have 
the  answer  to  his  question  printed  on  the  test  blank  at  just 
that  point. 

8.  If  the  test  elements  are  not  numbered  on  a  test,  the  maximum 
credit  in  terms  of  cumulative  number  of  possible  rights  in  each 
column  are  printed  at  the  foot  of  the  columns. 

9.  In  all  cases  where  possible,  the  questions  are  numbered  on 
both  the  left  and  the  right  margins  of  the  page  for  ready 
reference  in  scoring.  At  the  right-hand  side  of  the  page,  the 
question  numbers  are  staggered  sufficiently  to  allow  the  odd- 
numbered  questions  to  appear  in  a  column  on  the  left  and  the 
even-numbered  questions  in  a  column  on  the  right.  This 
allows  one  to  compute  the  reliability  of  a  test  by  the  odds- 
evens  method,  since  the  "wrongs"  or  the  "rights"  in  either 
column  may  be  readily  added  up  at  a  glance.  Where  the 
subject  is  to  place  his  answer  on  a  short  horizontal  line  in 
columns,  the  answer  spaces  are  also  staggered,  thus  allowing 
more  writing  space  for  the  answer. 

10.  At  the  completion  of  the  test  material  and  immediately  above 
the  "Wait  for  the  signal"  sign,  a  heavy  horizontal  line  is 
always  placed,  indicating  that  the  test  is  finished  at  that  point. 

1 1 .  Where  the  subject  works  in  horizontal  rows,  and  frequently  in 
columns  also,  the  test  material  is  grouped  by  horizontal  spaces 
into  groups  of  five  test  elements.  This  enables  the  test  sub- 
ject better  to  keep  his  place,  and  also  is  an  aid  to  the  scorer  in 
scoring. 

12.  If  there  are  several  test  reactions  in  a  given  horizontal  row,  as 
in  encircling  the  pairs  which  make  10,  the  numbers  placed  in 
parentheses  at  the  left  of  the  rows  indicate  the  cumulative 
sums  of  the  possible  credits  in  the  preceding  rows.  This 
makes  possible  very  quick  computation  of  the  number  of 
attempts. 

13.  In  such  a  test  as  the  above,  where  several  test  reactions  occur 
in  a  row,  an  arbitrary  keying  device  for  determining  the 
credits  in  that  row  is  employed  in  the  right-hand  column  of 
figures  or  letters;  thus  in  a  test  involving  figures  at  the 
extreme  right  of  each  successive  row,  the  figures  may  readily 
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be  made  to  show  the  number  of  possible  correct  responses  in 
each  of  the  rows  respectively. 

14.  In  general,  the  tests  aim  always  to  require  an  actual  mark  to 
be  made  in  order  to  receive  any  credit;  i.e.,  the  tests  aim  to 
avoid  the  type  of  test  wherein  one  checks  only  the  errors, 
leaving  blank  those  that  are  right  or  vice  versa.  The  scorer 
is  quite  confused  by  such  procedure.  One  can  readily  have 
the  test  subject  write  "  C  "  for  those  items  that  are  correct  and 
"W"  or  "X"  for  those  that  are  wrong. 

15.  It  is  very  desirable  that  few  dotted  guide  lines  be  used,  since 
some  subjects  are  prone  to  write  their  answers  on  the  dotted 
line.  Consequently,  it  has  been  the  aim  to  eliminate  these 
wherever  possible  by  always  using  solid  lines  for  the  answer 
space  and  by  so  arranging  the  position  of  the  question,  or 
dividing  it  up  in  questions  extending  onto  two  lines,  that  the 
end  of  the  question  will  be  very  close  to  the  answer  space 
without  any  intervening  guide  lines.  In  multiple  choice  tests 
it  is  very  desirable  to  line  up  the  left-hand  end  of  each  choice 
in  columns,  dividing  the  direction  part  of  the  element  into  two 
lines  if  necessary  and  placing  the  choices  opposite  the  second 
line. 

16.  In  questions  which  extend  over  more  than  one  line,  the  answer 
space  is  always  placed  on  a  level  with  the  last  line  of  the 
question. 

Since  adequate  criteria  of  ability  to  progress  could  be  secured, 
it  was  decided  to  attempt  securing  a  series  of  tests  which  by  differ- 
ential weights  would  predict  differential  capacity  to  progress  in 
each  of  the  three  business  college  courses,  stenography,  typing, 
and  bookkeeping.  It  was  also  possible  to  construct  an  average 
criterion,  or  average  of  the  sigma  positions  in  each  of  the  other 
three  courses,  and  so  to  predict  "general  business  ability"  in  much 
the  same  way  as  the  average  of  abilities  in  a  number  of  grade 
school  subjects  is  commonly  called  "general  scholarship." 

The  excellent  criteria  which  have  been  secured  are  the  results 
of  many  days  of  painstaking  effort  in  compiling  from  teachers' 
grade  books,  the  school  register,  the  attendance  book,  and  various 
other  sources,  including  the  results  of  performance  tests  specially 
devised  for  the  purpose,  the  facts  which  bear  on  success  in  busi- 
ness college  courses.  The  instructors  were  unusually  cooperative 
not  only  in  making  available  their  own  records,  but  in  giving  and 
scoring  performance  tests  and  looking  up  the  records  of  individual 
students.  In  all,  undoubtedly  more  time  was  spent  in  obtaining 
the  final  criterion  scores  than  was  consumed  in  both  giving  and 
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scoring  the  32  tests.  This  fact  is  mentioned  in  order  to  show  the 
amount  of  work  necessary  to  determine  an  adequate  criterion. 
These  criteria  undoubtedly  have  a  high  statistical  reliability, 
since  they  contain  so  many  items,  each  of  which  represents  the 
results  of  much  practice  on  the  part  of  the  student.  With  a  low 
reliability  of  the  criterion  one  cannot  hope  to  determine  the  best 
series  of  tests  for  predicting  ability  to  progress  in  the  several 
courses.  We  feel  that  we  have  largely  eliminated  this  factor  in 
the  stenographic  criterion;  the  typing  criterion  is  probably  less 
reliable,  and  the  bookkeeping  still  less.  The  order  of  efficiency  of 
the  final  scales  is  in  the  same  order,  which  might  suggest  that  we 
need  but  to  increase  the  reliability  of  our  criteria  in  order  to  in- 
crease the  validity  of  our  tests.  In  fact  the  maximum  correlation 
of  our  weighted  scale  of  tests,  C,  with  the  Criterion,  /,  is  given  by 
the  formula,  rIC  =  Vru  .  rCa  m  which,  rn  is  the  reliability  co- 
efficient of  the  criterion,  and  rCc  the  reliability  coefficient  of  the 
combination  of  tests.  By  increasing  the  reliability  of  either  the 
criterion  or  the  tests  we  increase  the  maximum  limit  of  the  corre- 
lation of  our  tests  with  the  criterion,  ric.  Whether  the  actual 
value  of  rIC  will  always,  or  even  usually,  increase  is  not  known 
although  it  seems  likely  that  an  increase  in  rIC  will  automatically 
follow  an  increase  in  the  reliability  of  either  the  criteria  or  the 
tests.  It  is  apparent,  however,  that  there  are  cases  where  there 
would  be  no  increase,  and  other  cases  where  the  increase  would  be 
small.  A  test  which  correlates  o  with  a  criterion  will  not  correlate 
any  higher  no  matter  how  much  the  reliability  of  both  the  crite- 
rion and  the  tests  is  increased. 

The  method  of  deriving  the  four  criteria  scores  is  shown  in 
Appendix  II. 

The  Selection  of  Tests  for  the  I.E.R.  General  Clerical 

Scale,  C-i 

After  the  tests  had  been  administered  and  scored  according  to 
the  standard  scoring  directions,  and  after  the  criteria  of  ability  to 
progress  in  the  typing,  stenography  and  bookkeeping  courses  had 
been  computed,  the  correlations  of  the  32  tests  and  also  of  age  and 
last  school  grade  completed  were  computed  with  the  three  criteria 
respectively.  These  correlations  are  shown  in  Table  XVII  to- 
gether with  the  old  time  limits  (that  is,  the  time  limits  used  in  the 
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B-C  Business  College,  B-M  Business  College,  and  B-D  Business 
School) ;  the  new  time  limits  (those  established  for  the  final  revised 
set  of  tests  with  the  exception  of  test  13,  which  is  given  an  extra 
half  minute  in  the  T.C.  edition  of  the  General  Clerical  Test);  the 
maximum  scores  on  the  several  tests;  the  class  intervals,  (I),  used 
in  transmuting  the  gross  scores  to  small  class  numbers;  the 
standard  deviations  and  averages  by  the  old  time  limits,  for  the 
three  groups,  typing,  stenography  and  bookkeeping,  respectively. 

The  table  used  for  transmuting  the  original  gross  scores  into 
transmuted  classes,  for  ease  of  computation  of  correlation  co- 
efficients, is  shown  in  Table  XVIII.  When  transmuting  the 
scores  of  any  test,  say  Test  1 ,  one  uses  the  column  of  Table  XVI 1 1 
which  has  the  same  class  interval;  thus  Test  1  has  the  class  inter- 
val 3  and  one  would  use  the  (I=3)-column  of  the  table.  Gross 
scores  of  12  or  13  or  14  would  each  be  called  transmuted  scores  of 
5;  scores  of  15,  16  or  17  on  this  test  would  thus  be  called  trans- 
muted scores  of  6,  and  so  on. 

Scoring  formulae,  in  the  case  of  stenographers,  were  computed 
for  each  of  31  tests  (Test  3  does  not  have  any  "Wrongs"  and  con- 
sequently a  scoring  formula  is  impossible).  Without  entering 
into  details,  it  may  be  generally  stated  that  the  result  was  quite 
unsatisfactory,  none  of  the  tests  yielding  enough  larger  correlations 
with  the  criterion  by  the  use  of  the  scoring  formula  to  justify  the 
added  labor  of  using  a  scoring  formula.  In  many  cases  the  scoring 
formula  gave  positive  credit  for  " Wrongs"  rather  than  imposed 
penalities.  Speed,  rather  than  accuracy,  is  significant  in  many  of 
the  tests  since  the  scoring  formula,  S  =  R+CXW,  exactly  equals 
"Attempts."  This  does  not  at  all  mean  that  stenographers  do 
not  need  to  be  accurate.  Rather  is  the  explanation  analogous  to 
the  kind  of  handwriting  required  in  everyday  life;  after  one  can 
write  legibly  there  is  little  need  for  one's  improving  the  quality  of 
his  handwriting  unless  he  is  a  professional  copyist  or  letter  ad- 
dresser, while  increments  in  the  speed  are  very  desirable;  thus,  if 
one  who  can  write  legibly  could  increase  his  speed  up  to  the  point 
of  80  words  per  minute,  there  would  be  absolutely  no  need,  in  his 
case,  for  employing  a  stenographer  or  for  using  shorthand,  pro- 
vided he  could  maintain  the  speed  for  some  length  of  time. 

Neither  does  this  mean  that  scoring  formulae  might  not  yield 
desirable  advantages  in  increased  correlation  with  the  criterion  of 
tests  to  be  used  for  predicting  rates  of  progress  in  bookkeeping  or 
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Standard  Grouping  Tables,  Used  in  Transmuting  Unit  Test 

Scores 


Call  It 
Class  : 

Score  Is  the  Corresponding  Class  on  the  Left 
and  the  Right  When  the  Class  Interval,  I,  Is: 

Call  It 
Class  : 

1 

2 

3 

4 

5 

0 

1 

0 

0-  1 

0-  2 

0-  3 

0-  4 

0-  5 

2 

1 

2-  3 

3-  5 

4-  7 

5-  9 

6-1 1 

2 

X 

0 

2 

4-  5 

6-  8 

8-1 1 

10-14 

12-17 

3 

4 

3 

6-  7 

9-1 1 

12-15 

15-19 

18-23 

4 

5 

4 

8-  9 

12-14 

16-19 

20-24 

24-29 

5 

6 

5 

10-11 

15-17 

20-23 

25-29 

30-35 

6 

7 

6 

12-13 

18-20 

24-27 

30-34 

36-41 

7 

8 

7 

14-15 

21-23 

28-31 

35-39 

42-47 

8 

9 

8 

16-17 

24-26 

32-35 

40-44 

48-53 

9 

10 

9 

18-19 

27-29 

36-39 

45-49 

54-59 

10 

11 

10 

20-21 

30-32 

40-43 

50-54 

60-65 

1 1 

12 

11 

22-23 

33-35 

44-47 

55-59 

66-71 

12 

13 

12 

24-25 

36-38 

48-51 

60-64 

72-77 

13 

13 

26-27 

39-41 

52-55 

65-69 

78-83 

14 

15 

14 

28-29 

42-44 

56-59 

70-74 

84-89 

15 

16 

15 

30-31 

45-47 

60-63 

75-79 

90-95 

16 

17 

16 

32-33 

48-50 

64-67 

80-84 

96-101 

T7 

18 

17 

34-35 

51-53 

68-71 

85-89 

102-107 

18 

typing.  Administration  of  the  test  will  be  much  simpler  if 
"Rights"  (R)  only  are  used  as  a  score,  and  consequently  the 
number  of  Rights  has  been  used  as  the  score  in  all  the  correlations 
later  included  in  this  report. 

At  this  stage  the  technique  for  determining  the  n  best  tests  was 
not  yet  available.  Neither  was  the  requisite  amount  of  time 
available  for  solving  all  the  possible  intercorrelation  coefficients 
and  resorting  to  the  "trial  and  error"  method,  which  was  the  only 
feasible  method  then  available  for  handling  a  situation  of  such 
magnitude  as  that  involving  34  test  variables.  Instead,  by  the 
concensus  judgment  of  three  test  workers,  aided  by  the  data  of 
Table  XVII,  the  following  twelve  tests  were  selected  for  further 
evaluation: 

Unit  Test  Nos.  6,  10,  12,  17,  20,  21,  23,  24,  25,  26,  28,  32. 
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These  were  determined  on  the  bases  of  (i)  the  magnitude  of 
their  correlations  with  the  several  criteria;  (2)  the  fact  that  they 
gave  positive  correlations  with  all  three  criteria  and  required  a 
minimum  of  revision  in  order  to  be  thoroughly  objective  and 
satisfactory  in  their  administration  and  scoring. 

At  this  point  the  tests  were  revised  slightly  by  the  addition  of 
geometrical  improvements  in  the  test  page,  the  changing  of  time 
limits  and  the  re-wording  of  a  very  few  questions  which  had  given 
trouble  previously.  The  changes  were  not  of  such  a  nature  as  to 
be  expected  to  cause  any  marked  changes  in  the  correlations  of  the 
tests,  either  among  themselves  or  with  the  criteria. 

The  intercorrelations  of  the  12  test  variables  above  named  were 
now  computed  for  each  of  the  three  criteria.  By  Dr.  T.  L. 
Kelley's  "trial  and  error"  method,  the  beta  weights  (see  Table 
XIX)  were  determined  through  two  or  three  successive  approxi- 
mations. These  were  not  carried  out  further  owing  to  lack  of 
time. 

The  seven  starred  tests,  6,  12,  17,  20,  24,  25,  26,  became  the 
Army  clerical  tests  published  by  the  Army  E  and  R  schools  in 
1 92 1.  The  needed  constants  of  the  army  edition  are  shown  in 
Table  XX. 

The  Method  of  Determining  the  Nine  "Generally  Best" 
Clerical  Tests 

After  the  seven  tests  had  been  selected  by  the  process  above 
described,  the  writer  discovered  the  method  of  multiple  ratio 
correlation.  Without  having  tested  out  the  truth  of  the  assump- 
tion, it  was  assumed  that  when  two  tests  added  in  turn  to  a 
previously  existing  composite  yield  different  multiple  ratio  regres- 
sion weights,  that  one  which  gives  the  higher  weight  would  give 
also  the  higher  multiple  ratio  correlation.  This  was  subsequently 
determined  to  be  a  false  assumption,  since  the  multiple  ratio 
correlation  is  a  function  not  only  of  the  correlation  of  the  variable 
added  with  the  previously  existing  composite,  but  also  of  its 
correlation  with  the  criterion.  It  has  subsequently  been  empiri- 
cally determined  that  the  assumption  here  used  is  a  close  approxi- 
mation to  the  truth,  although,  by  using  that  method,  we  perhaps 
did  not  secure  the  maximum  possible  correlation  nor  the  same 
selection  of  tests  which  we  would  have  secured  had  we  used  a 
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method  based  upon  the  multiple  ratio  correlation  rather  than  upon 
the  multiple  ratio  regression  weight.  The  false  assumption  then 
does  not  mean  any  error  in  the  correlations  or  weights  secured,  but 
means  only  that  possibly  the  best  selection  was  not  made. 

TABLE  XIX 


Last  Approximation  to  Beta  Weights  of  Twelve  Clerical  Tests 


Last  Approximation  to 

Weight  for  Criteria 

Revised 
Ttmt?  t>j 

SfT  FfTFD 

New  Army 
Test  No. 

Typ. 

Stenog. 

Bkg. 

Minutes 

6 

.22 

.60 

—  .02 

2 

* 

1 

10 

•17 

.46 

.06 

6 

12 

.20 

•55 

.26 

3* 

* 

3 

17 

.18 

.40 

•19 

5 

* 

5 

20 

•53 

•59 

•03 

2\ 

* 

4 

21 

.  IO 

•55 

07 

3 

23 

•03 

.20 

.20 

2\ 

24 

•5i 

.18 

Ah 

* 

6 

25 

•52 

.46 

•27 

3 

7 

26 

.21 

.48 

•24 

3* 

2 

28 

•03 

.07 

•27 

3 

32 

.  10 

•55 

.08 

1 2 

12 

Tests 

iV=37 

N=37 

iV=49 

Total 
40  min. 

Total  time  of  7  selected 
tests =24  minutes 

The  twelve  tests  which  had  been  previously  weighted  by  Dr. 
Kelley's  "trial  and  error"  method  were  subjected  to  the  two 
following  procedures: 

A  subsequent  application  of  the  formula  for  determining  rIC  by 
giving  zero  weight  to  the  variable  considered  1  was  applied  in 
succession  to  variables  17,  12,  6,  25,  24,  26,  20.  This  showed  that 
tests  17,  12  and  6  might  be  immediately  discarded  from  further 
consideration  as  contributing  but  little  to  the  three  combination 
correlation  coefficients  when  the  other  nine  tests  are  included  with 
their  then  available  weights.    The  results  also  showed  that  tests 

1  Kelley,  T.  L.  Tables  to  Facilitate  the  Calculation  of  Partial  Coefficients  of  Corre- 
lation, etc.  Bulletin  27.  Univ.  of  Texas,  Austin,  Tex.,  1914  (out  of  print),  Formula 
b,  p.  23- 
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25,  24  and  onwards  in  the  present  C-i  series  could  not  profitably 
be  discarded  from  the  combination,  as  their  elimination  would 
have  large  effects  in  decreasing  the  values  of  the  three  f/c's. 

The  list  of  intercorrelations  which  were  computed  was  increased 
finally  to  include  tests  20,  24,  26,  28,  23,  32,  21,  10,  1,  11,  13,  16, 
27»  9»  25t  Age,  School  Grade. 

1 .  Test  20,  which  had  the  highest  average  correlation  with  the 
three  criteria,  was  taken  to  be  the  backbone  test  for  each  of  the 
three  scales  which  we  were  beginning  to  construct,  a  typing  scale, 
a  stenography  scale  and  a  bookkeeping  scale. 

2.  The  multiple  ratio  regression  weights  for  each  of  the  eight 
other  variables  in  the  case  of  each  of  the  three  criteria  were  com- 
puted. By  this  procedure  Test  24  proved  to  have  the  highest 
average  multiple  ratio  regression  weight.  Accordingly,  it  was 
taken  to  be  the  second  test  in  each  of  the  three  scales  under 
construction.  The  multiple  ratio  correlation  coefficients  rIC 
were  computed  for  each  of  the  three  scales.  These  yielded  in- 
creases in  the  correlation  coefficient  in  each  case  above  the  corre- 
lations which  obtained  between  the  criteria  and  Test  20  alone. 

3.  By  computing  the  multiple  ratio  regression  weights  of  the 
remaining  seven  variables  for  each  of  the  three  criteria  in  turn, 
Test  26  proves  to  have  the  highest  average  multiple  ratio  regression 
weight.  Again,  with  Test  26  included  as  the  third  test,  the  new 
increased  multiple  ratio  correlation  coefficient  was  computed  for 
each  of  the  three  criteria  and  all  gave  increases  in  the  correlations 
of  the  several  criteria  above  the  previously  existing  composites. 

4.  By  a  repetition  of  the  above  process  the  other  tests  thus 
to  be  added  are  in  turn  Tests  28,  13,  9,  11,  16,  25  and  School 
Grade. 

Without  the  inclusion  of  School  Grade,  the  correlations  with 
the  criteria  of  the  three  scales  are  as  follows: 

Typing  scale  with  typing  criterion   r=  .62dc  .07 

Stenography  scale  with  stenography  criterion   r  =  .  7 1  ±  .06 

Bookkeeping  scale  with  bookkeeping  criterion   r=  -59±  06 

The  General  Business  criterion  was  subsequently  computed,  and 
the  General  Business  scale,  when  the  tests  are  entered  in  the  order 
as  previously  determined,  correlates  with  the  General  Business 
criterion  to  the  extent  of  .58=*=  .05.  With  the  inclusion  of  School 
Grade,  which  may  be  used  as  a  test,  the  correlations  with  the 
several  criteria  are  as  follows: 
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Typing  scale  with  typing  criterion   r  =  .62  ±  .07 

Stenography  scale  with  stenography  criterion   r  =  .71  =b  .06 

Bookkeeping  scale  with  bookkeeping  criterion   r=  .64^  .06 

General  business  scale  with  general  business  criterion . .  r  =  -59db  .05 

The  probable  errors  of  these  correlation  coefficients  range  from  .05 
to  .07.  It  will  thus  be  seen  that  these  scales  in  the  clerical  field 
compare  quite  favorably  with  scales  for  predicting  success  in 
academic  school  courses.  The  weights  in  some  of  the  scales  for 
some  of  the  tests  come  out  negative.  This  is  undesirable  in  a 
general  scale  such  as  a  general  intelligence  scale,  since,  if  it  became 
known  to  the  pupils  that  credit  were  subtracted  for  a  high  score  on 
a  test,  it  would  be  unlikely  that  any  one  would  make  a  high  score 
on  a  test.  However,  in  the  differentiation  between  different 
business  school  courses,  these  negative  weights  are  ones  which  are 
quite  desirable  for  producing  differential  variations  in  the  total 
fitness  scores  of  a  given  person  for  each  of  the  three  courses. 
Negative  weights  are  then  actually  desirable  in  the  problem  of 
securing  differential  fitness  scores  for  various  occupations  or 
various  school  subjects.  We  are  badly  in  need  of  a  formula  for 
the  reliability  of  such  weights  in  order  to  know  whether  such 
negative  weights  will  remain  of  the  same  approximate  magnitude 
were  the  weights  to  be  recomputed  on  a  second  experimental 
group,  or  even  whether  they  would  remain  negative  on  a  second 
experiment.  If  the  probable  error  of  such  a  weight  were  too  high 
we  would  be  uncertain  whether  a  negative  weight  would  retain  its 
negative  sign  upon  a  second  repetition  of  the  experiment. 

If  further  experiment  were  to  prove  the  feasibility  of  such 
differential  weights  for  predicting  progress  in  different  occupations 
or  in  different  occupational  courses,  the  added  validity  secured  by 
such  differential  weights  would  justify  our  giving  a  much  larger 
number  of  tests  than  we  intend  using  in  any  one  particular  scale 
and  then  weighting  the  n  highest  tests  in  the  case  of  each  of  the 
various  courses  or  occupations  for  which  we  wished  to  determine 
the  fitness  of  the  individual.  With,  say,  twenty  tests  so  chosen 
that  there  would  be  ten  extremely  good  tests  for  predicting  ability 
in  four  different  lines,  one  would  score  all  twenty  tests  but  would 
weight  in  the  case  of  each  of  the  four  courses  considered  only  those 
ten  tests  which  best  predicted  ability  in  that  course.  The  ad- 
dition, of  course,  of  the  other  ten  tests  to  each  of  the  scales  would 
add  something  to  the  predictive  value  but  hardly  enough  to 
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justify  the  extra  trouble  involved  in  weighting  them.  The  weight- 
ing can  be  easily  taken  care  of  by  mechanical  stencils  which  give 
the  weighting  fitness  scores  corresponding  to  all  possible  raw 
scores  on  the  various  tests. 

The  writer  has  long  been  keenly  aware  of  the  seemingly  great 
waste  involved  in  giving  a  new  test  whenever  some  little  additional 
fact  is  to  be  discovered  in  regard  to  a  person's  ability.  In  some 
school  systems,  where  testing  is  quite  in  vogue,  the  pupils  may 


TABLE  XXI 

Table  of  Constants  for  General  Clerical  Test,  C-i,  B-C  Business 

College  * 


Test 
No. 

Old 

New 

Old 
Cum. 
Time 
Min. 

New 
Cum. 
Time 
Min. 

Multiple  Ratio 

ric 

/^-Weights 

Time 
Min. 

Time 
Min. 

Typ. 

Sten. 

Bkg. 

Gen. 
Bus. 

Typ. 

Sten. 

Bkg. 

Gen. 
Bus. 

20 

3l 

2} 

3j 

2-j 

•  546 

•  497 

.317 

425 

1 .00 

1  . 00 

1 .00 

1 . 00 

24 

5 

4s 

8* 

549 

.536 

.407 

•  474 

.  11 

.66 

1 .05 

.65 

26 

4 

3* 

12i 

iol 

•  550 

.607 

.461 

.520 

.05 

1 .05 

1 . 19 

.83 

28 

2 

3 

14  s 

13* 

551 

651 

•  519 

.522 

—  .08 

-  -77 

2  .00 

•  15 

13 

5 

5 

195 

18* 

.587 

.668 

-SSI 

•  539 

.41 

46 

1 .80 

.64 

9 

5* 

5* 

25 

24 

.588 

693 

•  571 

.560 

—  .08 

73 

1.83 

.84 

1 1 

3 

4 

28 

28 

.610 

■  693 

•574 

•  568 

•  34 

-  05 

•  79 

55 

16 

3 

2 

31 

30 

.614 

705 

•  578 

.568 

.  21 

-.46 

LIS 

.  12 

25 

4 

3 

35 

33 

.615 

.710 

•  594 

584 

— .  10 

•  3i 

2 . 19 

1 . 04 

School 

Grade 

0 

0 

35 

33 

.621 

■  714 

.641 

•  588 

•  27 

.27 

4  83 

.42 

Number  of  cases  

37 
.008 

37 
005 
056 
Test 
20 

49 
.006 

81 

37 

37 

49 

81 

Samoline  correction  for 

rr<;  ■ 

.  004 
•  049 
Test 
20 

P.E.r/r 

.069 
Test 
20 

057 
Sch. 
Grd. 

Highest 

546 

497 

484 

425 

*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  XXI  may  be  found  from  the 
following  table: 


P.E.r  When  r  Is: 


n  = 

0 

* .  2 

±3 

*.4 

*.s 

±.6 

*  .7 

±  .8 

±9 

37 

.11 

.11 

.  10 

.09 

.08 

.07 

.06 

.04 

.02 

49 

.  10 

.  10 

.09 

.09 

.08 

.07 

.06 

•  OS 

03 

.02 

81 

.08 

.08 

07 

07 

.06 

.06 

•  05 

.04 

03 

.01 
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TABLE  XXII 

Table  of  Multiple  Ratio  Correlations  of  Three  Scales  as  the  Less 
Useful  Tests  Are  Eliminated  in  Order,  Retaining  the  Best  Test 
for  the  Final  Single  Test  Scale.    Army  Form 


Tests  in  Series 

Time  in 

Minutes 

Typ. 

rIC 
Sten. 

Bkg. 

20 

2-2- 

•546 

•497 

•317 

26,  20 

6 

•547 

596 

•305 

24,  26,  20 

10} 

•549 

.605 

.416 

25,  24,  26,  20 

i-3* 

•554 

.605 

.528 

6,  25,  24,  26,  20 

15* 

•554 

.612 

•519 

12, 6,  25,  24,  26,  20 

19 

•556 

.611 

•537 

17, 12, 6, 25,  24,  26,  20 

24 

•556 

.611 

•541 

Number  of  Cases* 

37 

37 

49 

*  For  P.E.r  see  footnote  to  Table  XXI. 


receive  as  many  as  fifty  tests  in  the  run  of  a  year.  A  composite  of 
all  of  these  fifty  would  have  predicted  any  one  of  the  facts  which 
any  of  the  scales  was  aiming  to  determine  very  much  better  than 
any  single  one  of  the  scales.  Any  twenty-five  of  those  fifty  tests, 
selected  at  random,  would  undoubtedly  have  predicted  any  of 
those  abilities  better  than  any  two  of  the  scales  used.  Knowing 
in  addition  one's  reading  and  arithmetic  ability,  one  can  predict 
better  his  ability  to  get  along  in  a  vocational  course  than  if  he 
merely  knows  his  score  on  some  mechanical  test  such  as  the 
Stenquist  Assembly  Test.  The  Stenquist  Assembly  Test  prob- 
ably correlates  higher  with  success  in  many  mechanical  courses 
than  any  single  one  of  the  tests  which  we  might  give,  but  the 
Stenquist  test  plus  any  kind  of  intelligence  test  will  assuredly  tell 
us  more  about  ability  to  progress  in  a  mechanical  course  than  will 
the  Stenquist  Test  alone.  Likewise,  the  school  records  which 
have  accumulated  during  the  school  career  of  pupils,  if  available 
in  the  7th  or  the  8th  grade,  may  have  very  high  value  for  predict- 
ing their  further  success  either  in  industry  or  in  continued  school 
work.  This  is  shown  under  the  discussion  of  this  topic  in  a 
previous  section  of  this  report. 
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The  B-D  Army  Business  School  Results 

In  order  to  determine  whether  the  nine  "generally  best"  tests 
and  School  Grade,  ten  variables  in  all,  selected  upon  the  results 
from  the  B-C  Business  College,  would  yield  reliable  predictions 
with  the  weights  given  them,  when  applied  to  a  different  business 
school  group,  the  results  of  these  ten  variables  were  worked  up 
on  the  B-D  business  school  soldiers  who  took  the  thirty-two  busi- 
ness school  tests  at  the  same  time  as  the  B-C  Business  College 
students.  The  tests  were  given  under  standard  conditions  by  the 
camp  examiner,  Mr.  F.  A.  Moss.  The  pupils  were  rated  four 
times  independently  on  the  traits  defined  below  by  their  instruc- 
tors on  a  scale  of  I  to  5  " lowest,  low,  average,  high,  highest." 
The  instructions  for  the  ratings  were  typewritten  and  handed  to 
the  instructors,  and  for  the  four  ratings  in  the  subjects  of  stenog- 
raphy, typing  and  bookkeeping  were  as  follows: 

Stenographers : 

a)  First  ranking:  Include  in  your  judgment  of  merit  the  factors 
which  you  usually  take  into  consideration  as  contributing  to 
progress  in  stenography. 

b)  Second  ranking:  Consider  here  only  the  readiness  and 
accuracy  with  which  the  pupil  grasps,  remembers,  and  employs 
the  theory  of  stenographic  characters. 

c)  Third  ranking:  Consider  only  the  interest  in  the  work, 
ambition  to  succeed,  and  general  class  morale. 

d)  Fourth  ranking:  Consider  exactly  the  same  factors  that  you 
used  in  the  first  ranking. 

Typists: 

a)  First  ranking:  Include  in  your  judgment  of  merit  the  factors 
which  you  usually  take  into  consideration  as  contributing  to 
progress  in  the  acquirement  of  skill  in  typing. 

b)  Second  ranking:  Consider  here  only  your  impressions  of  the 
order  in  which  you  would  choose  the  pupils  if  you  wanted  a  faith- 
ful copy  of  matter  submitted,  handed  in  promptly.  Record  the 
order. 

c)  Third  ranking:  Consider  only  interest  in  the  work,  ambition 
to  succeed,  and  general  class  morale. 

d)  Fourth  ranking:  Consider  exactly  the  same  factors  that  you 
used  in  the  first  ranking. 

Bookkeepers: 

Follow  the  same  general  principles  as  are  laid  down  for  securing 
data  for  criteria  scores  for  stenographers  and  typists. 

1.  Secure  as  many  grades  of  all  sorts  as  are  available.  Any 
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numerical  grade  is  valuable  if  reported  for  all  members  of  the 
class.  Grades  reported  in  terms  of  "Excellent,"  "Good,"  etc., 
are  of  considerable  value  as  are  also  letter  grades  of  "A,"  "B," 
"C,"  etc.  All  are  of  more  value  if  taken  at  different  intervals  of 
time  throughout  the  course.  Each  individual  grade  series  should 
be  reported. 

2.  Ratings  of  speed  and  accuracy  should  be  reported  separately. 

3.  Secure  independent  judgment  rankings  in  order  of  merit  by 
the  method  described  above,  using  the  following  central  theme  as 
a  basis  for  judgment  on  each  occasion: 

a)  First  ranking:  Consider  general  ability  to  make  progress  in 
acquiring  a  mastery  of  the  course. 

b)  Second  ranking:  Consider  the  probability  that  the  pupils' 
books  will  "prove  up "  on  any  trial  balance,  or  in  other  words  that 
his  books  are  accurate. 

c)  Third  ranking:  Consider  only  interest  in  the  work,  ambition 
to  succeed,  and  general  class  morale. 

d)  Fourth  ranking:  Consider  exactly  the  same  factors  that  you 
used  in  the  first  ranking. 

With  these  ratings  arranged  in  parallel  columns,  it  was  apparent 
that  the  standard  deviations  were  approximately  equal  in  the  four 
columns  of  ratings  of  each  subject.  Accordingly,  the  gross 
measures  of  the  four  ratings  were  summated,  which  procedure 
amounted  approximately  to  obtaining  the  average  position  in 
the  four  rankings.  This  combined  score  became  the  criterion 
in  the  courses  in  typing,  stenography,  and  bookkeeping,  respec- 
tively. The  weights  and  standard  deviations  used  for  the  ten 
variables  respectively  are  shown  in  Table  XXIII.  The  corre- 
lations between  the  test  scores  weighted  by  each  of  four  weightings 
and  these  criteria  were  computed  and  are  given  in  Table  XXIV. 

There  is  every  reason  to  believe  that  the  criterion  here  is  very 
much  less  reliable  than  that  at  B-C  business  college ;  consequently , 
even  lower  correlations  of  the  scale  with  the  criterion  would  be  a 
substantiation  of  the  validity  of  the  tests  for  measuring  ability  to 
progress  in  acquiring  typing,  stenography,  and  bookkeeping. 

Theoretically,  it  would  be  better  to  use,  in  determining  the 
business  school  gross  score  weights,  the  jS's  previously  determined 
and  to  use  the  o-'s  of  the  new  groups.  This  was  not  done  because 
of  lack  of  time,  and  also  from  the  consideration  of  the  fact  that  any 
scale  for  wide  use  will  probably  have  to  have  predetermined  gross 
score  weights.  If  all  of  the  standard  deviations  changed  propor- 
tionately on  the  several  tests  upon  a  change  in  the  level  of  ability 
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TABLE  XXIII 

The  <t's  and  M's  of  Selected  9  Tests,  Old  (B-C  Business  College)  Time 
Limits.    B-C  Business  College 


Typing:  iV=37;  Stenog.:  iV=37;  Bkg.:  iV  =  49;  Genl.:  JV=8i 


o-'s  (Old  Time) 

M's  (Old  Time) 

Test 

No. 

Typ. 

Sten. 

Bkg. 

Genl. 

Typ. 

Sten. 

Bkg. 

Genl.* 

20 

13-49 

16.34 

15-59 

16.070 

63  -47 

60.39 

56.62 

24 

3-76 

3-73 

4.18 

4n73 

10.97 

10.97 

9 -65 

26 

9.64 

9.70 

10.18 

9.840 

33  03 

32.30 

32.29 

28 

8.66 

8.19 

8.19 

8.062 

17-74 

17.96 

16.91 

13 

2  .61 

2.56 

2.83 

2-797 

6.65 

6.40 

6.96 

9 

10.17 

8.56 

r»   t  1  (\ 
9  ■  i  6° 

30.27 

29.38 

28.55 

1  j 

1  00 

0  yy 

4-34 

4-74 

A  ^67 

27.31 

26.77 

24.42 

16 

6.14 

5-79 

6.19 

S  87Q 

13-74 

13-85 

13.60 

6  6s 

6.46 

6.10 

6 . 269 

11 .96 

11. 15 

13-81 

SG 

I  .  64 

1.67 

1 .81 

T  767 

10. 11 

10.03 

9.46 

Gross  Score  Wgts.  = 

Test 

No. 

Typ. 

Sten. 

Bkg. 

Genl. 

Typ. 

Sten. 

Bkg. 

Genl. 

20 

1  .00 

1 .00 

1 .00 

1  .00 

.0741 

.0612 

.0641 

.0622 

24 

.66 

1.05 

65 

.0293 

.1769 

•  2512 

•1596 

26 

•05 

1.05 

1. 19 

•83 

.0052 

.1082 

.1169 

•0843 

28 

-  .08 

-•77 

2  .00 

•15 

— .0092 

— .0940 

.2442 

.0186 

13 

.41 

.46 

1 .80 

.64 

•I57I 

.1797 

.6143 

.2288 

9 

-.08 

•73 

1.83 

.84 

-  .0078 

.0718 

.2138 

.0919 

11 

•34 

-05 

•79 

•55 

.0852 

-0115 

.1667 

•1259 

16 

.21 

-.46 

1  15 

.12 

.0342 

-  0794 

.1858 

.0204 

25 

—  .  10 

•3i 

2 . 19 

1 .04 

-  0150 

.0480 

•3590 

•1659 

SG 

.27 

.27 

483 

.42 

.  1646 

.1617 

2.6685 

•2377 

*  Not  computed  for  lack  of  time. 


of  test  subjects,  the  relative  importances  assigned  to  the  several 
tests  would  remain  constant.  The  variabilities  probably  do  not 
vary  enough  from  proportional  changes  to  make  any  marked 
effect  in  the  final  results.  The  likelihood  of  the  above  statements 
being  true  is  enhanced  by  the  consideration  of  the  fact  that,  with 
the  relative  weights  of  tests  approximately  correct,  fairly  large 
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TABLE  XXIV 

Correlations  of  Ten  Variables,  Weighted  by  Each  of  Four  Series  of 
Weightings,  With  the  Criterion  Scores  of  Stenography,  Typing 
and  Bookkeeping,  Respectively  * 


Criterion 

Correlations  with  the  Respective  Cri- 
teria Wthen  Weighted  by  the  Mul- 
tiple Ratio  Weights  Previously 
Determined  For: 

Number 
of 
Cases 

Typing 

Stenog- 
raphy 

Book- 
keeping 

General 
Business 

Typing  

.27 

•34 

•23 

.18 

49 

Stenography  

•79 

.87 

•73 

79 

8 

Bookkeeping  

.42 

•52 

•35 

.28 

19 

*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  XXIV  may  be 
found  from  the  following  table: 


P.E.r  When  r  Is: 


iV 

0 

±  .  1 

± .  2 

*  3 

*  -4 

=*=  -5 

±.6 

±-7 

±.8 

^•9 

8 

.24 

.24 

•  23 

.  22 

.20 

.18 

■15 

.  12 

.09 

•  05 

19 

•  15 

■IS 

•15 

•  14 

•  13 

.  12 

.  10 

.08 

.06 

.03 

49 

.  10 

.  10 

.09 

.09 

.08 

.07 

.06 

■  os 

•  03 

.02 

changes  in  the  weights,  if  made  at  random,  will  have  but  a  small 
effect  on  the  multiple  ratio  correlation.  This  theorem  allows  one 
to  use  integral  gross  score  weights,  which  are  but  good  approxi- 
mations to  the  true  relative  gross  score  importances,  with  but 
little  variation  of  rIC>  from  its  maximum  amount. 

After  the  twenty-eight  page  folder  of  tests  (to  be  described 
later)  had  been  administered  to  the  Company  I  employees  and  the 
correlations  had  been  computed,  it  became  evident  that  Test  1, 
Arithmetic  (Correct  and  Incorrect  Additions  and  Subtractions, 
the  requirement  being  that  errors  be  checked)  was  one  of  the  best 
tests  in  predicting  the  criterion  scores  in  Company  I.  It  was 
added  as  an  additional  test  at  a  gross  score  weight  of  3,  which 
gives  it  an  importance  about  the  same  as  that  of  Test  9.  This 
addition  may  not  add  materially  to  the  multiple  ratio  correlation 


Ability  With  Clerical  Items  and  Procedures 


89 


coefficient.  Since,  however,  it  is  placed  as  No.  1  of  the  ten  tests 
of  the  I.E.R.  General  Clerical  Scale,  it  acts  as  a  good  "buffer 
test."  The  requirements  are  readily  grasped  by  any  clerical 
worker  and  three  minutes  spent  on  this  test  serves  to  allay  any 
nervousness  which  the  test  subject  may  have.  Even  though  the 
addition  of  such  a  test  does  not  add  materially  to  the  validity  of 
the  test  it  may  add  something  to  its  reliability.  Its  arbitrary 
inclusion,  therefore,  seems  quite  justified.  The  nine  tests  hitherto 
described,  plus  this  new  test,  became  the  I.E.R.  General  Clerical 
Test,  C— I,  sometimes  referred  to  as  the  Toops  Clerical,  or  Toops 
Business,  Test. 

The  Reliability  of  the  I.E.R.  General  Clerical  Test 

The  I.E.R.  General  Clerical  Test,  Form  A,  used  throughout 
this  investigation,  was  given  by  Mr.  Luton  Ackerson  in  April 
1922  to  seniors  (3-year  course)  in  business  classes  in  the  Julia 
Richman  High  School.  Form  B  was  given  the  following  June 
to  the  same  pupils.  There  were  145  pupils  who  took  both  forms. 
The  tests  were  weighted  with  the  general  gross  score  weights: 

Wt.  Wt. 

Test   1   31  Test  20  2 

Test   9 .  .   3  Test  24  6 

Test  11   3  Test  25  4 

Test  13   10  Test  26   2 

Test  16   4  Test  28   i\ 

The  correlation  between  the  total  score  of  the  first  and  second 
giving  is  .82.  =*=  .02. 

Inasmuch  as  a  high  school  business  group  in  the  senior  year  is  a 
highly  selected  group,  it  seems  quite  likely  that  the  reliability  of 
the  scale  when  applied  to  a  thirteen-  or  fourteen-year-old  group 
would  be  substantially  of  the  same  order  as  the  reliability  co- 
efficients of  the  well-known  intelligence  scales.  The  average 
score  on  Form  A  was  71.7  and  on  Form  B,  81.0,  or  a  gain  of  13  per 
cent  of  the  second  giving  upon  the  first.  It  is  obviously  impos- 
sible to  state  to  what  extent  this  gain  is  due  to  an  initial  lesser 
difficulty  of  Form  B  over  Form  A.  The  variability  is  about  the 
same  in  both  cases,  the  standard  deviation  being  11.3  for  Form  A 
and  1 1.5  for  Form  B. 

1  This  weight  refers  to  the  original  form  of  Test  i,  where  the  incorrect  additions 
only  were  checked;  in  the  new  printed  form  where  both  correct  and  incorrect 
additions  are  marked,  the  weight  of  Test  1  is  i\. 
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Correlation  of  I.E.R.  Test  C-i  with  Success  in  Actual 
Clerical  Work 

We  have  shown  that  the  selected  set  of  nine  (or  ten)  tests  is 
predictive  of  success  in  business  schools.  We  have  now  to  de- 
scribe its  value  in  diagnosis  and  prediction  of  success  in  actual 
clerical  work. 

As  always,  the  only  decisive  experiments  will  be  those  where  the 
tests  are  given  to  young  people  whose  later  careers  are  followed. 
However,  we  may  profitably  study  individuals  already  engaged  in 
clerical  work,  comparing  the  scores  they  make  in  the  tests  with 
their  demonstrated  success  "on  the  job." 

A  valid  measurement  of  demonstrated  success  in  clerical  work  is 
not  easy  to  find  and  apply.  Probably  the  most  reasonable 
measure  to  use  would  be  salary  (per  unit  of  time)  attained  at 
equal  age  after  equal  length  of  experience,  with  some  allowance  for 
pleasant  or  unpleasant  conditions  of  work.  We  have  been  un- 
able to  find  any  group  available  for  test  whose  individuals  could 
be  so  measured.  The  next  best  criterion  would  be  a  ranking  in 
order  of  success  given  by  superior  officers  who  would  allow,  at 
least  roughly,  for  age  and  experience,  and  who  would  consider 
salary,  attractiveness  of  work,  and  promise  of  promotion  due  to 
the  quality  and  quanity  of  work  being  done  by  the  individual  in 
comparison  with  those  of  like  present  salary.  We  have  been 
able  to  obtain  a  criterion  approximating  to  this  in  the  case  of  73 
clerical  employees  in  Company  W.  We  had  also  the  very  great 
advantage  of  being  able  to  test  these  individuals  with  the  later 
constructed  I.E.R.  Test  C-2  and  the  Stenquist  Assembly  Test. 

The  opportunity  to  give  the  tests  to  these  employees  of  Com- 
pany W  came,  however,  only  near  the  end  of  our  year's  work. 
In  the  meantime,  we  secured  such  data  as  we  could;  and  it  seems 
best  in  this  report  to  follow  roughly  the  chronological  order  of  our 
work. 

Tests  of  Company  I  Employees 

The  group  of  ten  business  tests  chosen  for  the  final  series,  and  a 
number  of  other  tests  in  addition,  were  given  to  301  employees  of 
Company  I.  The  efficiency  ratings  on  two  of  the  subjects  were 
not  available  at  the  time,  so  that  the  report  on  the  combined 
groups  below  consists  of  only  299  cases.  Four  alternative  forms 
of  Unit  Test  20,  Vocabulary,  were  assembled  and  given  to  these 
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subjects.  Three  forms  of  Test  28,  Coordinates,  were  also  pre- 
pared and  given.  In  addition  to  the  series  of  ten  tests  of  C-i, 
five  forms  of  the  Woodworth-Wells  Number  Checking,  Test  39, 
were  provided  and  given,  each  form  being  given  with  a  two- 
minute  time  limit.  Three  forms  of  finding  addresses,  Test  40 
(Test  6  of  the  Thorndike  Non-Verbal  Clerical  C-2,  but  differing  in 
that  the  C-2  series  has  a  six  minute  time  limit)  were  prepared  and 
each  form  was  given  with  a  time  limit  of  three  minutes.  A  test  of 
writing  numbers,  Test  41,  in  squares  of  uniform  size,  f  inch 
square,  beginning  with  11  and  continuing  upward  serially,  was 
given  with  a  two-minute  time  limit.  Test  18,  from  the  Army 
Beta,  Same-Different  Numbers,  was  given  with  a  three-minute 
time  limit.  The  test  of  alphabetical  filing  of  names  in  the  revised 
form  (Unit  Test  23)  involving  the  idea  originally  used  in  the 
Thurstone  name-filing  test  of  his  clerical  series,  was  given  in  two 
forms  with  a  time  limit  of  2\  minutes  each.  The  test  was 
changed  so  that  the  subject  had  merely  to  write  on  the  blank 
preceding  each  name  in  the  work  column  the  number  correspond- 
ing to  his  name  as  found  in  the  alphabetical  column.  This 
changes  the  test  somewhat  from  Thurstone's  original  test  but 
makes  it  more  readily  scorable  by  stencil.  And,  finally,  the  last 
test  was  the  Company  I  form  of  same-different  numbers  and 
names  originally  modeled  after  forms  used  by  Thurstone,  Thorn- 
dike,  and  Army  Beta.  This  test  is  referred  to  as  ' 4  Page  28  "  in  the 
intercorrelation  tables.  The  employees  had  all  been  rated  by 
their  superintendents  some  two  or  three  weeks  previous  to  the 
giving  of  the  test  in  the  routine  periodical  rating  which  is  taken 
every  six  months  by  this  firm.  The  rating  scale  used  is  the  Com- 
pany I  revision  of  the  Scott  rating  scale  plan  adapted  to  clerical 
employees.  The  company  has  an  elaborate  set  of  tables  in  which 
the  wages  of  employees  are  supposed  to  be  regularly  advanced 
according  to  the  increase  in  ratings  received  on  this  rating  scheme. 
The  detail  with  which  these  tables  have  been  made  out,  and  the 
fact  that  psychological  tests  and  ratings  have  been  a  fundamental 
part  of  the  personnel  work  of  this  firm  for  a  number  of  years,  led 
us  to  believe  that  in  all  probability  the  ratings  of  these  employees 
were  as  accurate  as  were  to  be  found  anywhere  in  industry. 
This  group  of  clerical  workers  was  chosen  for  that  reason  and  for 
the  additional  one  of  the  hearty  interest  in  the  tests  evidenced  by 
the  management. 
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Without  going  into  the  laborious  procedure  of  weighting  each 
test  according  to  the  exact  weights  determined  from  the  B-C 
Business  College  group,  the  arbitrary  weights  of  the  following 
table  were  assigned  to  the  several  tests,  resulting  in  the  approxi- 
mate gross  score  weights,  hereinafter  called  "gross  score  weights," 
of  3,  3,  3,  io,  4,  2,  6,  4,  2,  ij  for  Tests  I,  9,  11,  13,  16,  20,  24,  25, 
26,  28  respectively. 

TABLE  XXV 


Derivation  of  Arbitrary  Gross  Score  Weights  of  Test  C-i 


Four 

Three 

Test  No. 

1 

9 

11 

13 

16 

pages* 

24 

25 

26 

pages* 

20(1-4) 

28(1-3) 

Arbitrary  jS 

Weight  

.6 

■  7 

.6 

•7 

.6 

.8 

.6 

•5 

.6 

•4 

(j  of  299  Cases .  .  . 

8.8 

9.8 

8.0 

2.4 

5-3 

56.8 

3-7 

4.6 

9-5 

28.4 

Approximate 

Weight  of  Gross 

Scores = 40  X  W 

3 

3 

3 

10 

4 

1 
2 

6 

4 

2 

1 

*  When  only  one  page  (one  form)  of  Test  20  is  given,  it  should  be  given  a  gross 
score  weight  of  2  in  order  to  maintain  its  relative  importance;  similarly,  when  only 
one  page  (one  form)  of  Test  28  is  given,  it  should  be  weighted  i|.  The  general 
principle  is  that  with  n  pages  weighted  W,  one  page  should  be  weighted  n-W  in  order 
to  maintain  its  importance  relative  to  the  other  tests. 


The  299  cases  having  complete  records  were  divided  up  into 
four  clerical  groups  based  on  similarity  of  name  of  occupation. 
This  grouping  does  not  necessarily  mean  a  classification  based 
upon  the  greatest  similarity  of  work.    The  groups  are: 

Regular  Business   9°  cases 

Typists   28  cases 

File  Clerks   32  cases 

District,  Transfer,  and  All  Others  not  Otherwise  Tabulated  149  cases 

Total    299  cases 

The  regular  business  clerks  are  for  the  most  part  those  who  do  the 
odd-job  clerical  work  and  are  the  poorest  paid.  The  typists  and 
file  clerks  for  the  most  part  have  a  more  routine  or  specialized  type 
of  work  than  is  the  case  in  most  firms.  Those  district  and  trans- 
fer clerks  who  work  on  specialized  clerical  work  require  in  general 
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more  ability  than  is  required  of  the  regular  business  clerks.  With 
this  group,  however,  is  combined  a  motley  group  of  44 all  others" 
about  whom  nothing  in  general  can  be  said  more  than  that  they 
are  occupied  on  specialized  tasks  each  requiring  no  more  than  I 
to  4  or  5  persons  in  the  entire  establishment.    The  following  cor- 


relations result: 

Typists  and  Regular  Business  Clerks1  (n8  cases)  Criterion  r 

and  Weighted  Scores  07  ±  .06 

Regular  Business  (90  cases)  Criterion  and  Weighted  Scores  .  15  ±  .07 

Typists  (28  cases)  Criterion  and  Weighted  Scores  02  d=  .  13 

District,  Transfer  (149  cases),  and  All  Others  Criterion 

and  Weighted  Scores  03  ±  .05 

File  Clerks  (32  cases)  Criterion  and  Weighted  Scores  I4±  .  12 


In  an  effort  to  discover  whether  length  of  service  had  any  effect 
on  the  criterion  (for  if  it  has  then  it  must  be  taken  into  account  in 
the  criterion  used  for  testing  the  tests),  the  following  correlations 
were  computed: 

Typists  and  Regular  Business  Combined  (118  cases) 


r 

Criterion  and  Nine  Weighted  Test  Scores2  07±.o6 

Criterion  and  "Rated  after  Months"  (Length  of  Specific 

Job)   20  ±.06 

Criterion  and  "Length  of  Time  with  Firm"  07 ±  .06 


Weighted  Total  Nine  Tests2  and  Length  of  Time  with  Firm  .  io=b  .06 

Thus,  as  shown  by  the  low  correlation  of  criterion  and  experi- 
ence (either  length  of  time  with  the  firm  or  length  of  time  on  the 
specific  job),  the  criterion  is  not  badly  affected  by  experience 
attenuation,  at  least  with  the  two  groups  thus  here  combined. 
Yet  time  on  the  specific  job  is  a  better  predictor  of  criterion  score 
than  the  nine  tests.  Neither,  of  course,  are  high  enough  to  be  of 
any  value  as  tests. 

Dr.  Thorndike  computed  the  correlation  between  the  old  Com- 
pany I  examination  devised  by  him  in  1914  and  the  official  1921 
rating  in  the  case  of  twenty-four  clerks  examined  in  19 15  and  at 
work  in  1 921,  with  the  result:  ^=.38=^.12.  This  does  not  neces- 
sarily indicate  (with  this  N)  a  higher  relationship  than  on  the  nine 
selected  tests.    In  the  case  of  the  32  (larger  N)  file  clerks,  rated 


1  Typists  have  highest  criterion  scores  and  regular  business  lowest;  therefore  this 
combined  group  should  give  the  maximum  correlation. 

2 Test  1,  Arithmetic,  had  not  yet  been  added  to  the  C-i  Clerical  Series. 
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by  one  supervisor,  we  have  four  separate  tests  which  correlate 
with  the  filing  criterion  to  a  greater  extent  than  .30,  the  highest 
single  test  correlating  with  the  criterion  to  the  extent  of  48. 

Some  three  months  previous  to  the  giving  of  these  tests,  the 
management  of  Company  I  decided  to  revise  the  examination 


TABLE  XXVI 
Correlations  of  Various  Tests  with  Company  I  Rating 


Correlation  with 

Criterion  of: 

Test  Name 

Unit 
Test  No. 

Reg. 

File 

Dist. 
Transfer 

All 

Bus. 

Typists 

Clerks 

and  All 

Subjects 

Others 

Combined 

Arithmetic  

1 

•  27 

.15 

.  21 

.04 

.09 

8 

8 

9 

•  03 

•  17 

.07 

.08 

—  .02 

9 

.8 

Number  Copying  

11 

—  03 

—  .18 

.09 

.07 

03 

8 

0 

Fruit  Tabulation  

13 

.20 

.00 

.00 

.04 

.07 

2 

4 

16 

.11 

.04 

•  17 

03 

.08 

5 

•  3 

Thorndike  Vocabulary. .  . 

20(1-4) 

.14 

23 

•  15 

.01 

.01 

56 

8 

24 

.  12 

—.is 

—  .30 

.07 

—•OS 

3 

•  7 

Business  Information. .  .  . 

25 

—  .02 

—  .02 

.24 

—  .10 

—  .06 

4 

6 

26 

■  17 

—  .18 

31 

—  .11 

—  .01 

9 

•  5 

28(1-3) 

•  IS 

.  10 

•  31 

.00 

—  .01 

28 

4 

Woodworth  Number 

39(1-5) 

•  13 

.05 

.09 

—  .02 

.01 

73 

.8 

40(1-3) 

.07 

•  17 

•  03 

.06 

.01 

7 

2 

Number  Writing  

41 

.11 

—  .07 

—  .10 

•  05 

.00 

13 

.0 

Same-Different  Numbers 

18 

•  IS 

.01 

07 

—  .07 

—.04 

6 

•  5 

23(1-2) 

—  .02 

—  .16 

•34 

.  12 

.03 

14 

0 

Company  1  Same-Diff. .  . 

Page  28 

•05 

•  15 

.48 

.04 

.04 

11 

.8 

School  Grade  Completed 

.16 

.25 

.04 

.02 

.00 

Number  of  Cases  

90 

28 

32 

149 

299 

*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  XXVI  may  be  found  from 
the  following  table: 


P.E.r  When  r  Is: 


iV 

0 

±  .  I 

±  .2 

*.3 

*-4 

=*=.5 

±.6 

*-7 

±.8 

=•=  9 

28 

.13 

■13 

.  12 

.  12 

.11 

.  10 

.08 

.07 

.05 

.02 

32 

.  12 

.  12 

.11 

.  11 

.  10 

09 

.08 

.06 

.04 

.02 

90 

•  07 

■07 

•07 

.06 

.06 

■  05 

•  05 

.04 

•  03 

.01 

149 

.06 

05 

•05 

.05 

.05 

.04 

.04 

.03 

.02 

.01 

299 

.04 

.04 

.04 

.04 

.03 

03 

•  03 

.02 

.01 

.01 
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which  had  been  prepared  by  Dr.  Thorndike  for  them  in  19 14. 
The  new  examination  did  not  take  nearly  so  long  to  administer  as 
the  earlier  form  of  the  examination  and  is  given  in  omnibus  form 
with  an  over-all  time  limit  restriction  of  75  minutes.  This  form  of 
examination  had  been  given  to  307  cases  whose  ratings  were 
available  for  comparison  with  the  new  examination.  The  corre- 
lations were  worked  out  by  Dr.  Thorndike  with  the  result  that  the 
correlation  of  the  present  new  examination  of  the  Company's  test 
and  new  rating  (rating  used  by  us)  is  .16  =±=.04  with  N  =  ^oj  cases. 

In  order  to  determine  whether  possible  coaching  outside  the 
test  room  was  responsible  for  the  low  correlations,  the  correlation 
between  the  weighted  nine-test  composite  score  and  criterion  for 
the  first  (November  30,  a.m.)  section  to  take  the  tests  (N  =  yj) 
was  computed,  with  the  result:  r—  —  .05=^.08.  The  r  would  be 
expected  to  be  attenuated  by  coaching  only  in  the  case  of  later 
subjects,  since  the  first  group  might  communicate  information  to 
the  later  groups.  Hence,  we  conclude  that  coaching  was  not  a 
factor  in  producing  the  low  correlations. 

The  correlations  of  the  several  tests,  all  groups  combined,  or 
299  cases,  with  the  criterion  gives  the  fifth  column  of  Table  XXVI. 
This  table  shows  as  well  the  correlations  of  the  separate  tests  with 
the  criterion  in  the  case  of  the  four  groups:  A.  Regular  Business 
Clerks  (90  cases) ;  B.  Typists  (28  cases) ;  C.  File  Clerks  (32  cases) ; 
D.  District  Clerks,  Transfer  Clerks  and  all  others  not  tabulated 
in  A,  B,  or  C  (149  cases). 

The  correlations  of  last  School  Grade  Completed  and  the  crite- 
rion in  the  case  of  the  four  groups  and  the  total  group  are  as  shown 
on  the  last  horizontal  row  of  correlations  in  the  Table  XXVI. 

It  will  be  noted  that  none  of  the  correlations  above  reported  are 
of  sufficient  size  to  indicate  that  any  of  the  tests  given  are  of 
practical  value  for  predicting  the  criteria  of  the  separate  groups  or 
total  group.  A  few  of  the  single  tests  have  correlations  high 
enough  to  be  of  practical  value,  but  there  is  no  assurance  that 
these  would  obtain  upon  the  second  giving  of  the  tests. 

Much  might  be  written  in  attempted  explanation  of  these 
extraordinary  results,  but  it  seems  best  to  say  nothing,  since  we 
were  unable  to  investigate  the  criterion  itself. 
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The  Correlations  of  the  General  Clerical  Test  with 
Other  Tests  in  the  Case  of  School  Children 

Shortly  after  the  experiment  with  the  employees  of  Company  I, 
we  conducted  an  extended  series  of  experiments  with  school 
children,  which  were  designed  to  reveal  the  suitability  of  our 
different  tests  for  use  with  children,  and  the  degree  to  which  the 
different  tests  did  measure  different  abilities.  The  General 
Clerical  Test  C-i  was  satisfactory  in  respect  to  ease  of  giving  and 
scoring  and  suitability  for  children  of  the  ages  in  question.  But 
it  became  clear  from  the  correlations  (cf.  Table  V,  p.  22)  that,  in 
the  case  of  these  children,  it  did  not  measure  an  ability  much 
different  from  that  measured  by  any  standard  test  of  general 
intelligence.  The  General  Clerical  test  correlates  nearly  .80  with 
the  Arith.-Re.  test,  and  correlates  with  ''Half-year  Gains"  and 
"Average  Work"  as  closely  as  the  Arith.-Re.  test  does,  approxi- 
mately .60. 

It  may,  therefore,  be  that  the  correlations  of  about  .70  found 
between  the  General  Clerical  Test  (when  properly  weighted)  and 
success  in  business  school  work  in  stenography,  typing,  and  book- 
keeping are  due  to  its  value  as  a  test  of  ability  to  deal  with  ideas 
rather  than  to  its  value  as  a  test  of  ability  to  deal  with  clerical 
items  and  procedures.    Or  these  two  abilities  may  be  more 

TABLE  XXVII 


Weights  of  the  Separate  Tests  of  the  I.E.R.  Test  C-2 


Test  No. 

1 

2 

3 

4 

5 

6 

Total 

0- Weight  

1 

1 

1 

2 

3 

3 

Administration  in 

Minutes  

2 

2 

3 

3 

6 

6 

22 

Combined  Administra- 

tion Time  

7 

3 

6 

6 

22 

Combined  0- Weight  .  . 

3 

2 

3 

3 

Approximate  Inter- 

quartile Range  of  7A, 

7B,  8A,  8B,  P.S.  J 

Pupils  (iV=20i) .  .  .  . 

22 

4.6 

5-5 

2-5 

0/Q  

•H 

•43 

•55 

1  .2 

/3/Q/.n  =  W,  (Gross 

Score  Weight)  

1 

4 

5 

10 
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nearly  identical  than  we  have  supposed.  On  the  other  hand,  the 
correlation  of  .80  need  not  preclude  a  considerable  independence.1 


These  facts  led  us  to  provide  the  Routine  Clerical  Test  C-2  of 
abilities  presumably  less  intellectual  than  those  included  in  the 
General  Clerical  Test  C-i. 

Clerical  Test  C-2  is  made  up  of  six  tests  which  were  found  by 
Thorndike  to  correlate  with  clerical  ability,  four  of  which  were 
found  by  McCall  to  correlate  very  slightly  with  general  intelli- 
gence in  a  grade  population. 

It  seemed  desirable,  in  the  opinion  of  Thorndike,  to  weight  the 
tests  according  to  the  importances  shown  in  Table  XXVII. 

The  seventh  and  eighth  grade  papers  of  Public  School  J  were 
available  for  determining  the  relative  variability  of  the  tests. 
The  approximate  Q's  were  determined  by  inspection  of  the  dis- 
tributions, Tests  1,  2,  and  3,  all  cancellation  tests,  being  added 
together  for  convenience.  These  yield  the  final  gross  score 
weights  respectively  of  1,4,  5,  and  10.  The  scores  weighted  with 
these  weights  are  recorded  in  the  master  data  books  as  "Thorn. 
Wtd.  Cler." 

The  distributions  of  the  7A-8B  inclusive  pupils'  scores  by  tests 
are  given  in  Table  XXVIII. 

These  distributions  show  that  the  time  limits  are  quite  satis- 
factory.   The  directions  for  Tests  4  and  5  are  seemingly  quite  too 


1  For,  let  us  suppose  that  a  good  business  test,  which  it  would  be  practically 
possible  to  construct,  correlates  to  the  extent  of  .80  with  a  valid  clerical  criterion, 
and  that  this  test  correlates  with  general  intelligence  to  the  extent  of  .62.  We  may 
determine  the  maximum  correlation  of  the  intelligence  test  with  the  clerical  criterion 
by  assuming  that  the  good  clerical  test  plus  the  intelligence  test  properly  weighted 
predict  the  clerical  criterion  perfectly;  or,  call  the  correlation  of  the  intelligence  test 
with  the  criterion,  rIi  and  then, 


whence,  17^  =  .967. 

The  intelligence  test  will  correlate  the  minimum  amount  with  the  criterion  when  its 
addition  to  the  clerical  test  adds  nothing  to  the  efficiency  already  possessed  by  the 
scale  of  one  test,  namely  the  clerical  test,  or, 


The  I.E.R.  Test  C-2 


whence,  ^=.496.    That  is,  with  the  two  correlations  given  as  assumed,  the  corre- 
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difficult,  since  20  per  cent  of  the  seventh  and  eighth  grades  of 
Public  School  J  make  zero  scores  on  Test  4,  and  11  per  cent  on 
Test  5.  Because  of  administrative  difficulties,  Test  5  has  been 
eliminated  from  the  form  to  be  used  in  the  1922-23  investigation. 
An  alternative  form  of  the  five  remaining  tests  is  being  con- 
structed. 

Test  C-2  was  given  to  the  318  girls  in  the  experiments  with 
school  pupils.  Its  correlations  with  other  tests  appear  in  Table 
V.  It  correlates  much  less  closely  with  Arith.-Re.  (about  .65) 
than  C-i  does,  and  somewhat  less  closely  with  School  Success 
(about  .45)  than  C-i  does.  Since,  as  we  shall  see  later,  it  corre- 
lates almost  as  closely  as  C-i  with  success  in  actual  clerical  work, 
it  has  been  retained  as  a  part  of  the  total  testing  plan. 

lation  of  the  intelligence  test  with  the  clerical  criterion  must  lie  between  the  limits 
of  .496  and  .967.    If  intelligence  should  correlate  as  low  as 

Clerical  criterion 


.62 


.496  with  the  clerical  criterion,  then  it  measures  nothing  not  already  measured  by  the 
clerical  test,  which  by  comparison,  r  =.80,  does  measure  many  specific  clerical 
abilities  not  measured  by  the  intelligence  test. 
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TABLE  XXVIII 

Distribution  of  Scores  of  Separate  Tests  of  Test  C-2,  in  the  Case 
of  201  Pupils  of  7 A  to  8 B  Grades  Inclusive.    Public  School  J 


Tests  i  +2+3 

Test 

4 

Test 

5 

Test 

6 

Score 

Fr. 

Score 

Ft- 
r  r. 

Score 

r  r. 

Score 

r  r. 

70—  76 

1 

0 

39 

0 

22 

0 

3 

77-  03 

0 

1 

2 

1 

5 

1 

8 

04—  9" 

1 

2 

6 

2 

4 

2 

8 

91-  97 

4 

3 

3 

3 

3 

3 

20 

98-104 

10 

4 

5 

4 

8 

4 

19 

105-111 

7 

5 

3 

5 

11 

5 

20 

1 12— 1 1 8 

9 

6 

3 

6 

5 

6 

24 

119-125 

!7 

7 

8 

7 

6 

7 

21 

126-132 

13 

8 

13 

8 

6 

8 

16 

I33_I39 

J5 

9 

10 

9 

6 

9 

20 

140-146 

19 

10 

21 

10 

9 

10 

15 

I47_I53 

20 

11 

15 

11 

11 

11 

12 

154-160 

15 

12 

19 

12 

9 

12 

8 

Iol— I 07 

14 

13 

14 

13 

19 

13 

3 

I68—I74 

19 

14 

15 

14 

7 

14 

1 

I75-I8I 

17 

15 

10 

15 

9 

15 

0 

l82—l88 

0 
0 

16 

10 

16 

11 

16 

1 

I89-I95 

5 

17 

3 

17 

9 

17 

1 

196-202 

4 

18 

0 

18 

9 

18 

0 

203-209 

2 

19 

0 

19 

4 

19 

1 

2IO-2I6 

0 

20 

0 

20 

6 

217-223 

1 

21 

0 

21 

201 

22 

1 

22 

0 

201 

23 

1 

23 

2 

24 

2 

201 

25 

5 

26 

2 

27 

3 

28 

2 

29 

1 

30 

4 

201 
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FURTHER  EXPERIMENTS  WITH  WORKERS 

The  Results  of  Tests  Given  to  Employees  of  Company  W 

Three  tests  (C-i,  C-2,  and  the  Stenquist  Assembly  Test)  were 
administered  to  a  selected  group  of  73  employees  of  Company  W. 
This  company  is  a  large  woolen  goods  manufacturing  concern, 
and  consequently  has  many  clerical  employees  who  do  routine 
work  in  checking  up  the  progress  of  patterns  and  orders  through 
the  mill. 

The  subjects  for  testing  were  selected  from  the  entire  clerical 
group  of  approximately  250  persons,  as  representing  all  degrees  of 
clerical  ability  employed  in  the  company.  The  subjects  were 
rated  by  their  immediate  supervisors  in  rank  order  of  ability. 
After  the  several  rank  orders  had  been  obtained,  they  were  com- 
bined into  one  rank  order  by  the  assistant  personnel  manager  on 
the  basis  of  a  knowledge  of  the  ratings  of  those  persons  who  were 
known  to  two  or  more  supervisors.  The  over-all  ranks  were 
transmuted  to  <r's  and  these  to  convenient  integral  positive 
numbers  by  the  arbitrary  formula,  /=  7.5+30-  the  decimals  of  / 
being  dropped.  This  formula  does  not  change  the  correlations, 
but  yields  /  scores  ranging  from  o  to  15.  This  over-all  trans- 
muted ranking  of  the  entire  group  of  employees  is  known  as  the 
' 4  over-all  criterion."  In  the  main  office,  there  were  37  employees 
who  were  well  known  to  one  supervisor.  The  ratings  of  this 
supervisor  will  be  known  as  the  "main  office  criterion." 

The  C-i  Clerical  Test  was  administered  to  all  the  employees 
under  standard  conditions.  The  C-2  Test  was  also  administered 
to  all  of  the  employees,  but  through  a  mistake  in  the  adminis- 
tration, the  time  limits  of  Tests  1  and  2  were  increased  each  by- 
one  minute,  and  that  of  Test  3  by  two  minutes,  so  that  the  total 
test  was  given  a  twenty-six-minute  time  limit  instead  of  the  usual 
twenty-two-minute  time  limit.  The  Stenquist  Assembly  Test 
was  administered  to  all  but  twelve  of  the  employees.  Because 
of  lack  of  time,  one  group  of  the  two  groups  tested  was  allowed  to 
spend  only  twenty-two  minutes  on  the  Stenquist  Assembly  Test. 
The  scores  of  these  subjects  were  increased  by  arbitrary  scoring 
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formulae  to  approximately  the  scores  the  subjects  would  have 
attained  had  they  been  allowed  to  take  the  test  for  the  full  thirty 
minutes.1 

The  correlations  with  the  two  criteria, — the  general  office  crite- 
rion and  the  over-all  criterion,  the  latter  of  which  is  further  sub- 
divided into  men  and  women  and  both  combined, — of  the  I.E.R. 
Clerical  Test  C-i,  the  Thorndike  Clerical  Test,  the  Thorndike 
Clerical  Test  with  Test  5  taken  out,  C-2,  and  the  Stenquist 
Assembly  Test,  with  the  scores  corrected  in  the  case  of  the  short 
time  limit  group  (this  test  likewise  being  subdivided  into  men  and 
women,  in  addition  to  both  combined),  are  shown  in  Table  XXIX. 


1  The  Method  of  Supplying  Scores  on  the  Stenquist  Assembly  Test  in  Company  W. 
One  group  of  subjects  at  Company  W  was  given  the  Stenquist  Assembly  Test  with  a 
time  limit  of  twenty- two  minutes  instead  of  the  usual  thirty  minutes.  It  became 
necessary  to  make  an  adjustment  of  such  scores  to  make  them  as  comparable  as  may 
be  with  those  given  with  the  full-time  limit. 

The  ideal  method  in  such  cases  would  be  to  give  to  some  group  the  Stenquist  test 
and  to  determine  their  scores  at  the  end  of  each  successive  five  minutes.  By  corre- 
lating the  scores  at  each  of  these  points  with  the  total  score  one  could  determine  the 
proper  regression  equation  to  use,  along  with  interpolation,  to  determine  a  good 
approximation  to  the  score  which  would  be  obtained  in  the  total  time.  This  corre- 
lation coefficient  naturally  becomes  higher  and  higher  as  the  total  time  limit  is 
increased,  ultimately  becoming  a  perfect  correlation.  Consequently,  if  the  time  at 
which  the  shorter-time  group  stopped  the  test  is  in  the  neighborhood  of  the  total 
time,  there  will  be  very  little  regression  of  the  shorter-time  scores  upon  the  average  of 
the  30-minute  test  scores. 

In  this  case  such  regression  equations  were  not  available.  The  plot  of  the  scores 
of  men  and  women  separately  shows  a  very  marked  lack  of  overlapping.  After 
correction  by  the  method  below,  the  average  Stenquist  Assembly  Test  score  of  the 
men  was  67.1;  that  of  the  women  was  36.7;  the  difference  30.4,  is  7.1  times  the 
standard  error  of  the  difference.  This  standard  error  of  the  difference  is  4.3.  This 
fact  is  noted  at  this  point  because  so  large  a  difference  between  test  scores  of  men 
and  women,  working  for  the  most  part  at  the  same  type  of  work,  is  rarely  found. 
The  overlapping  between  men  and  women  on  the  Toops  Clerical  C-i  and  the  Thorn- 
dike Clerical  C-2  is  almost  perfect.  It  seems  certain  that  when  people  are  stopped 
at  22  minutes  on  a  test  there  will  be  greater  improvement  in  the  remaining  8  minutes 
in  the  case  of  those  people  who  make  above  average  scores  in  22  minutes.  Our 
arbitrary  correction  formula  should  take  account  of  this  fact.  The  average  score 
made  by  the  men  who  took  the  test  for  30  minutes  was  117  per  cent  of  the  score  made 
by  the  men  who  took  the  test  for  22  minutes;  similarly  the  average  score  for  the 
women  who  took  the  test  for  30  minutes  was  1 16  per  cent  of  the  score  made  by  those 
who  took  the  test  for  22  minutes.  The  desired  differential  between  the  "above 
average"  and  "below  average"  people  can  be  roughly  secured  by  the  following 
formulae  which  we  have  adopted. 

Arbitrary  formulae  for  raising  the  22  minute  scores. 
To  correct  the  men's  scores: 


Raised  Men:  117.24%  X  22-min.  score 

To  correct  the  women's  scores: 

Raised  Women:  115.62%  x  22-min.  score 


+5%  if  56  or  above  in  score  =  122.24% 
—5%  if  55  or  below  in  score  =  112.24% 

+3%  if  28  or  above  in  score  =  118.62% 
—3%  if  27  or  below  in  score  =  112.62% 
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In  the  case  of  the  entire  group  of  73  men  and  women,  the  C-i 
clerical  correlates  with  the  over-all  criterion  to  the  extent  of 
.40^.07,  and  the  C-2  clerical  (with  Test  5  out)  to  the  extent  of 
.38=^.07.    The  two  combined  by  regression,  give  rIC  =  42 ±.07. 

Inasmuch  as  a  computation  of  the  correlations  both  with  the 
rank  orders  and  with  the  transmuted  criterion  scores  (or  scores 
which  were  changed  into  terms  of  amount)  gave  results  about  the 
same  in  each  case,  it  seems  scarcely  worth  while  to  make  similar 
transmutation  for  the  cases  who  were  rated  by  one  supervisor  in 
the  main  office.  Correlations  of  the  tests  with  the  rank  orders  of 
the  main  office  criterion  are  substantially  of  the  same  magnitude 
in  this  group  as  for  the  over-all  criterion  group,  with  the  exception 
that  the  Stenquist  Assembly  Test  correlates  .51  =*=  .08  with  the 
main  office  criterion  and  only  .36  ±.08  with  the  over-all  criterion. 
The  main  office  criterion  correlates  with  the  over-all  criterion  to 
the  extent  of  .87=^.03,  with  iV  =  37- 

These  are  our  most  important  and  most  trustworthy  results 
from  workers.  They  show  a  moderate  correlation  between  both 
Tests  C-i  and  C-2  and  actual  success  on  the  job  (40  and  .38). 
Test  C-2  prophesies  success  in  work  in  this  group  a  trifle  better 
than  it  prophesies  average  work  or  half-year  gains  in  the  case  of 
school  children.  Tests  C-i  and  C-2  measure  somewhat  different 
abilities,  the  correlation  between  the  two  being  .71  plus  whatever 
increment  should  be  added  because  of  attenuation.  The  Sten- 
quist Assembly  Test  measures  notably  different  abilities,  corre- 
lating only  .22  =1=  .08  and  .06  ±  .08  with  Tests  C-i  and  C-2  respec- 
tively in  the  case  of  the  entire  group.  The  Stenquist  Test  is 
nearly  as  significant  of  success  among  these  workers  as  is  either 
clerical  test,  its  correlation  with  the  over-all  criterion  being  .36=*= 
.07  in  the  case  of  the  entire  group.  This  substantial  equality  of 
the  Stenquist  Test  with  either  clerical  test  in  signifying  success 
among  these  office  workers  is  puzzling,  but  the  only  satisfactory 
way  to  explain  it  is  by  further  experimentation. 

Tests  Given  to  Twenty-one  Company  O  Applicants  for 
Admission  to  the  Foreign  Training  Class  School 

At  the  invitation  of  the  personnel  manager  of  Company  O  the 
following  tests  were  given  to  twenty-one  applicants  for  the 
Foreign  Training  Class:  Army  Alpha,  Toops'  Business  Test 
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TABLE  XXIX 

The  Intercorrelations  of  the  Tests  Given  to  Employees  of 
Company  W  * 
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*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  XXIX  may  be 
found  from  the  following  table: 


P.E.r  When  r  = 


N 

0 

*  .  1 

*  .  2 

*.3 

*.4 

±.6 

*.7 

*.8 
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.11 
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C-i,  Stenquist  Assembly  Test,  Stenquist  Picture  Tests  I  and  II. 

In  addition,  the  following  three  social  facts  were  obtained  for 
each  individual:  Father's  occupation,  age  at  last  birthday,  and 
last  grade  completed  in  school.  Father's  occupation  was  given 
scores  on  the  following  basis: 

Unskilled  or  laborer  I 

Agricultural  2 

Skilled  trades  3 

Business  or  clerical  4 

Professional  5 

The  scores  on  Form  I  and  II  of  the  Stenquist  Picture  Test  were 
combined  by  adding  the  gross  scores. 

In  addition  to  the  examinations  given  by  the  Institute  each 
applicant  took  a  very  extended  examination  prepared  by  the 
company.  This  examination  consisted  largely  of  practical  prob- 
lems such  as  foreign  representatives  of  the  company  might 
presumably  be  expected  to  have  to  deal  with  in  their  foreign 
business  relations.  The  practical  problems  covered  the  following 
aspects : 

Interest  on  money  Freight 

Tabulation  of  statistics  Economical  operation  and  costs 

Areas  Marketing 
Average  prices  Transportation 
Yields  from  investment  Letter  writing 

Import  duties  Bad  accounts 

Letters  of  application 

Inasmuch  as  these  men  were  minor  executives  in  the  various 
sub-branches  of  the  company  at  the  time,  these  tests  presumably 
measure  ability  akin  to  trade  test  ability,  or  they  measure  the 
proficiency  attained  in  executive  work  of  the  type  which  is  done 
by  such  men  in  the  sub-offices  of  the  company.  The  final  per- 
centage ranking  given  by  the  company  is  referred  to  in  the  tables 
as  the  "Company's  test  ranking." 

The  twenty-one  men  were  ranked  in  order  of  merit  in  the  four 
tests, — Alpha,  Stenquist  Assembly,  Combined  Stenquist  Picture 
and  Clerical.  The  intercorrelations,  by  the  rank  difference 
method,  are  given  in  Table  XXX. 

It  will  be  seen  that  the  Clerical  Test,  Toops  Over-all  Ranking, 
Education,  Army  Alpha,  and  Combined  Stenquist  Picture  Test, 
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in  descending  order,  all  correlated  above  .50  with  the  Company's 
test  ranking.  By  the  use  of  the  multiple  ratio  correlation  tech- 
nique it  was  determined  that  the  selection  of  applicants  could  be 
made  on  the  basis  of  the  Clerical  Test,  Age,  Education,  Father's 
Occupation,  which  would  correlate  to  the  extent  of  .71  ±.07  with 
the  Company's  test  rating.  This  examination  would  require  37 
minutes. 


TABLE  XXX 

Correlations  Between  Tests,  Company  O,  N  =  2i.    March  9,  1922* 


Alpha 

Stenquist 
Assembly 

Combined 
Stenquist 
Picture 

I  AND  II 

Clerical 
C-i 

Age 

Education 

Father 
Occ. 

Toops 

Over-All 

RankingI 

Co's 
Test 
Ranking 

.  12 

.57 

.80 

—  .19 

.66 

.23 

•  93 

•  53 

Stenquist  Assembly .... 

.  12 

.42 

.26 

•  37 

•  45 

.19 

.30 

.28 
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•  57 

.42 

•  55 

.01 

.46 

.21 

.65 

•  51 

Clerical  C-i  

.80 

.26 

.55 

—  .13 

.65 

.  11 

•  94 

.60 

Age  

—  .19 

•  37 

.01 

—  .13 

.14 

.29 

—  .04 

—  .19 

.66 

•  45 

.46 

.65 

.14 

.16 

.68 

•  56 

.23 

.19 

.21 

.  11 

.29 

.16 

.  20 

.29 

Toops  Over-all  Ranking 

•  93 

•  30 

•  65 

•  94 

—  .04 

.68 

.  20 

•  58 

Co's  Test  Ranking  

•  S3 

.28 

•  51 

.60 

—  .19 

.56 

.29 

•  58 

*  The  Probable  Error  of  the  Correlation  Coefficients  in  Table  XXXI  may  be  found  from 
the  following  table: 


N 

P.E.r  When  r  = 

0 

±.1 

±2. 

±3 

±•4 

*.s 

±.6 

*.7 

±.8 

21 

•  15 

•  IS 

.14 

•  13 

.  12 

.11 

.09 

.08 

.05 

.03 

t  A  provisional  summation  of  the  ranks  of  all  tests  except  the  company's  test  ranking. 


As  far  as  it  seems  safe  to  predict  from  the  meagre  data  on  hand, 
the  traits,  as  measured  by  the  tests,  in  decreasing  importance 
required  of  an  applicant  for  the  foreign  training  class  are: 

1.  A  high  order  of  clerical  ability. 

2.  A  high-school  or  college  education. 

3.  That  his  father  be  engaged  in  a  highly  skilled  trade  or  business, 
or  preferably  in  a  clerical  or  professional  occupation. 

4.  That  the  applicant  be  a  young  man  rather  than  an  older  one. 
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The  multiple  ratio  importances  (0's)  of  the  traits  in  order  are  as 


When  these  /3  weights  are  divided  by  the  respective  standard 
deviations,  they  become  the  gross  score  weights  of  the  tests. 


It  was  found  possible  to  compare  the  distributions  in  total 
score  of  some  1704  people  who  have  been  given  the  C-i  Clerical 
Test  at  various  times  and  under  various  circumstances.  The 
results  of  such  a  comparison  are  shown  in  Table  XXXI,  where 
the  groups  are  arranged  in  descending  order  of  average  score  made 
when  the  tests  are  weighted  by  the  general  series  of  weights. 

It  is  necessary  to  point  out  a  few  factors  making  for  lack  of 
absolute  comparability  in  all  cases.  The  Cornell  University 
students,  or  Summer  School  students  in  a  course  in  mental 
measurements,  were  for  the  most  part  principals  and  superin- 
tendents in  New  York  State  schools.  They  took  the  printed  form 
of  the  test,  as  did  also  the  Ac  Business  College  accountants  and 
the  Company  W  group;  all  other  groups  took  the  mimeographed 
form.  There  may  be  some  slight  advantage  gained  by  those 
taking  the  printed  form.  The  B-C  Business  College  students, 
typists,  bookkeepers,  and  stenographers,  the  B-M  Business  Col- 
lege students,  and  the  B-D  Business  School  students,  typists, 
stenographers,  and  bookkeepers  took  the  mimeographed  form  of 
the  test  with  the  original  experimental  time  limits  which  varied 
from  the  final  time  limits  in  the  case  of  six  of  the  ten  tests  making 
up  the  clerical  scale.  Inasmuch  as  the  original  total  time  was  38 
minutes  compared  with  37  minutes  on  the  other  groups,  and 
inasmuch  as  the  tests  which  varied  markedly  from  the  final  time 
limits  received  about  the  same  gross  score  weights,  it  seems  likely 
that  the  comparison  is  a  fair  one.  Company  I,  Company  W, 
Company  O,  the  J.  R.  High  School  groups,  Cornell  University 
students,  and  the  boys  and  girls  of  the  public  school  groups 
received  the  test  under  the  standard  time  conditions.    Of  these, 


follows: 


Clerical  Test  

Education  

Father's  Occupation 
Age  


1 .00 
.72 
•52 
-  .60 


Group  Differences  in  Test  C-i 
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the  Company  I  employees,  the  Company  O  employees,  the  J.  R. 
High  School  groups,  and  the  boys  and  girls  of  the  public  school 
groups  had  the  test  in  the  mimeographed  form  as  well  as  under  the 
standard  time  conditions. 

It  is  interesting  to  note  the  almost  entire  lack  of  overlapping 
between  the  public  school  boys  and  girls  of  ages  12  to  15  inclusive 
with  the  business  college  students  studying  accounting,  the  Com- 
pany O  minor  executives,  and  their  rather  slight  overlapping  also 
with  the  B-C  Business  College  typists,  bookkeepers,  and  stenog- 
raphers. The  B-D  Business  School  pupils  were  soldiers  in  the 
E  and  R  Schools  of  the  Army  in  1920  and  are  for  the  most  part 
clerical  workers  who  were  taking  a  little  army  business  school 
training  in  order  to  better  fit  themselves  for  their  clerical  duties  in 
army  work. 

There  is  no  doubt  but  that  the  different  clerical  occupations 
are  approximately  in  their  correct  order  of  general  business  ability, 
and  that  accountants  should  rank  higher  than  typists,  bookkeep- 
ers, or  stenographers  of  a  like  commercial  college,  such  as  B-C 
Business  College,  and  also  that  they  should  in  general  rank  higher 
in  general  business  ability  than  clerical  workers  employed  by 
Company  I,  and  finally  that  these  should  be  superior  to  unselected 
groups  of  boys  and  girls  in  the  public  school.  This  series  then 
serves  in  a  rough  way  as  a  series  of  categorical  steps  in  general 
business  ability,  the  amounts  of  difference  in  general  business 
ability  between  groups  being  unknown,  as  is  likewise  the  exact 
ranking  of  these  groups  unknown,  although  in  general  it  is  known 
that  the  groups  are  approximately  in  their  correct  positions.  The 
fact  that  the  school  boys  and  girls  are  at  one  end  and  the  high 
school  and  university  students  are  at  the  other  end  with  almost 
total  lack  of  overlapping  proves  that  the  clerical  test  correlates 
very  highly  with  intelligence  and  academic  success.  The  hier- 
archical order  among  the  business  occupations  likewise  demon- 
strates that  the  test  does  distinguish  between  the  different  levels 
of  business  ability.  However,  there  is  nothing  in  this  to  indicate 
that  the  C-i  Clerical  Test  is  other  than  an  extremely  good 
intelligence  test  which  at  the  same  time  correlates  well  with  the 
different  levels  of  general  business  ability.  Inasmuch  as  most  of 
the  business  groups  are  people  who  are  learning  business,  and 
inasmuch  as  the  average  ages  of  these  groups  indicate  that  they 
are  adults,  the  obvious  conclusion  is  that  the  C-i  Clerical  Test 
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in  the  grade  school  at  the  ages  12  to  15  inclusive  does  not  predict 
later  business  capacity  any  better  than  would  any  other  good 
general  intelligence  test  made  up  of  a  considerable  number  of 
rather  non-verbal  elements.  It  may  well  be  that  general  intelli- 
gence, which  functions  so  highly  in  the  acquisition  of  success  in 
grade  school  work,  functions  equally  well  in  the  acquirement  of 
those  things  considered  essential  in  a  business  college.  Likewise, 
a  high  degree  of  general  intelligence  may  make  possible  a  short 
learning  period  for  acquiring  proficiency  in  a  business  occupation ; 
it  may  even  be  a  minimum  essential  for  entrance  to  some  of  the 
higher  level  clerical  occupations.  Once  in  such  an  occupation 
with  the  prerequisite  amount  of  preliminary  training,  it  may  be 
that  additional  increments  of  general  intelligence  function  to  no 
special  advantage  in  the  routine  work  of  such  occupations  but  are 
of  use  in  emergencies,  and  save  the  time  of  the  executive  when 
verbal  directions  are  to  be  understood  and  carried  out.  This 
does  not  preclude  the  possibility  of  all  higher  degrees  of  intelli- 
gence being  very  desirable  as  minimum  qualifications  for  entrance 
to  still  higher  level  business  occupations  or  to  executive  capacity 
in  such  occupations. 

Group  Differences  in  the  Stenquist  Assembly  Test  Scores 

The  results  of  the  Stenquist  Assembly  Test  which  was  given 
uniformly  to  a  number  of  groups,  the  averages  and  the  standard 
deviations  of  the  distributions,  are  shown  in  Table  XXXII.  It 
is  interesting  to  note  that  we  find  here  a  hierarchy  of  test  scores 
corresponding  very  much  to  the  hierarchy  found  in  the  clerical 
tests.  The  groups  of  men  are  arranged  roughly  in  an  order  of 
general  intelligence,  although  there  is  one  marked  exception :  Com- 
pany O  men  are  superior  to  Company  E  or  Company  W  men  in 
general  intelligence;  the  Company  O  men  have  had  little  mechan- 
ical experience,  their  fathers  consisting  of  a  larger  percentage  of 
clerical  people  than  any  of  our  other  groups.  The  girls  of  the 
public  school  are  markedly  inferior  to  the  boys  on  the  Stenquist 
Assembly  Test.  The  average  fifteen-year-old  girl  on  the  Sten- 
quist Assembly  Test  is  not  the  equivalent  of  even  the  average 
twelve-year-old  boy.  The  men  of  each  group  in  which  both  men 
and  women  were  tested  made  markedly  higher  Stenquist  Assem- 
bly Test  scores  than  the  women  of  the  same  group,  although  in 


no 


Tests  for  Vocational  Guidance  of  Children 


each  of  these  groups  the  two  sexes  differ  but  little  on  other  tests, 
as  is  usually  the  case  where  intelligence  or  clerical  tests  are  given 
to  men  and  women  in  the  same  school  courses  or  same  occupa- 
tional work  under  common  working  conditions. 

TABLE  XXXII 

Distribution  of  Scores  on  the  Stenquist  Mechanical  Assembly  Test, 

by  Groups  Tested 


Group  Tested 


Univ.    Students  (Men), 

Winter  Gr  

Univ.   Students  (Men), 

Summer  Gr  

High  School  R,  ist-Yr. 

Boys  

Company  W,  Men  

Company  E,  Men  . 

Co.  O,  Foreign  Training 

Class  Applicants  

Univ.  Students  (Women), 

Summer  Gr  

Univ.  Students  (Women), 

Winter  Gr  

Company  W,  Women. .  .  . 

Boys:  Age  is  

14  ••• 

13  

12  

Girls:  Age  15  

14  

13  

12  


13 
16 

145 
31 
86 


28 
27 
57 
120 
151 
107 
39 
83 
1  20 
76 


74 

I 

IS- 7 

71 

2 

28.8 

70 

I 

17.2 

67 

I 

17.2 

65 

3 

20.6 

56 

3 

22.6 

55 

6 

19. 1 

53 

3 

19-9 

36 

7 

IS- 6 

44 

I 

19.4 

45 

2 

19.6 

38 

4 

19. 1 

34 

0 

18.9 

28 

7 

16.8 

24 

6 

12.4 

20 

8 

133 

20 

4 

132 
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INSTRUCTIONS  FOR  ADMINISTERING,  SCORING, 
AND  RECORDING  THE  TESTS 

The  I.E.R.  Arith.-Re.  Test 

Directions  for  Administering  Part  I  of  the  Test  (Thorndike-Mc- 
Call  Reading  Test):  The  directions  for  administering  and  scoring 
the  Reading  section  of  this  test  are  those  prescribed  in  the 
standard  instructions  issued  with  the  test  blanks.  The  T-score  is 
used.    Time  allowed :  30  minutes. 

Directions  for  Administering  Part  II  of  the  Test  {Arithmetical 
Problems):  After  the  papers  have  been  distributed,  face  down 
on  the  pupils'  desks,  and  the  name,  grade,  and  age  of  the  child 
written  at  the  top  of  the  sheet,  read  the  following  directions: 

"These  are  some  problems  in  arithmetic.    Write  the  answers 
to  the  problems  on  the  blank  lines  at  the  right-hand  side  of  the 
page.    Use  your  extra  blank  sheet  to  figure  on." 
Time  allowed :  15  minutes. 

Scoring:  Score  is  number  of  answers  correct. 

Weighting  of  the  Arith.-Re.  Test.  See  Chapter  II,  page  12. 
The  weighted  score  is  the  T-score  in  the  Reading  Test  plus  three 
times  the  number  of  right  answers  in  the  Arithmetic  Test. 

I.E.R.  General  Clerical  Test,  C-i 
Directions  to  Examiners 

1.  See  that  all  subjects  are  provided  with  two  sharpened  pencils 
or  one  long  one  sharpened  at  both  ends. 

2.  Distribute  the  papers  face  up  on  the  desk  before  each  subject. 
As  the  examiner  starts  distributing  the  papers  he  says: 

"Start  at  once  filling  out  the  answers  to  the  questions  on  the 
front  page.  First,  fill  out  the  blanks  at  the  top,  and  then 
answer  every  question  beneath.  Do  not  turn  over  the  page 
until  I  tell  you  to." 

3.  The  examiner  then  explains  in  detail  how  to  fill  out  each 
blank.  (Examiner  gives  numbers  in  case  any  are  to  be  omit- 
ted.) He  passes  around  the  room  and  examines  each  sub- 
ject's paper  in  turn,  helping  the  subjects  to  fill  out  the  blanks 
wherever  necessary.    Have  the  class  instructor  aid  in  this. 

4.  The  examiner  then  reads  the  following  General  Directions: 

1 1 1 
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General  Directions 

"This  examination  is  intended  to  test  your  ability  in  some 
of  the  simple  operations  required  in  clerical  work. 

"There  are  a  few  general  directions  which  will  apply 
throughout  the  test.  Special  directions  are  printed  at  the  top 
of  each  test  for  you  to  read  before  doing  the  test.  Now,  look  at 
your  general  directions  on  the  front  page  while  I  read  them." 

The  examiner  pronounces  the  numbers  1,2,  etc.,  as  he  comes  to 
them. 

5.  "No.    1 .  Always  wait  for  the  signal  before  turning  the  page  to  a 

new  test.  Whenever  I  tell  you  to  do  so,  turn  over 
quickly  to  the  next  page.  My  signal  will  be  like 
this:  "Turn  over  to  Test  4.  Begin!'  That  means 
that  you  are  to  stop  Test  3  at  once,  turn  over  to 
Test  4,  and  begin  immediately." 

6.  "No.   2.   You  are  expected  to  do  all  you  possibly  can  on  every 

test.  Most  of  the  time  you  will  not  be  able  to 
finish  a  test  before  the  signal  comes  to  turn  over  to 
the  next  test.  Do  not  be  discouraged  if  you  are  not 
able  to  finish  any  of  the  tests  in  the  time  allowed. 
Few  people  are  ever  able  to  finish  the  tests." 

7.  "No.   3.  Stop  promptly  at  the  'turn  over'  signal,  and  begin  the 

next  test  at  once.  There  is  no  waiting  between  the 
tests.  Even  if  you  should  get  tired,  do  not  let  that 
keep  down  your  speed." 

8.  "No.  4.  Follow  exactly  the  printed  directions.    After  you 

have  once  begun  the  first  test,  no  spoken  directions 
will  be  given.  These  tests  all  require  you  to  un- 
derstand printed  directions,  as  well  as  to  do  certain 
things." 

9.  "No.   5.  Ask  no  questions.    I  will  not  answer  any,  and  it 

will  disturb  others  for  you  to  ask  them." 

10.  "No.   6.  Both  speed  and  accuracy  count.    Work  as  rapidly  as 

you  can  without  making  mistakes.  Neatness  does 
not  count  if  you  write  so  that  we  can  read  it" 

11.  "No.   7.  Cross  out  your  answer  and  correct  it  if  you  make  a 

mistake.  Don't  try  to  erase,  as  that  would  take 
too  much  time.  You  can  usually  answer  two  more 
questions  while  you  would  be  erasing  one  mistake. 
Try  to  be  correct  in  your  first  answer." 

12.  "No.   8.  Always  do  some  practice  work  on  the  practice  pages.'1 

13.  "No.   9.  Do  not  skip  around.    Usually  the  easier  questions 

come  first,  and  count  just  as  much  as  the  harder 
ones  further  along  in  the  test." 

14.  "No.  10.  Do  your  scribbling  and  figuring  on  the  margins  of  the 

pages. ," 
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15.  "  No.  II.  Check  the  correctness  of  your  work  if  you  get  through 

before  the  1  turn  over'  signal.'11 

16.  "  No.  12.   You  may  refer  to  the  directions  any  time  you  care  to. 

However,  you  will  waste  time  in  doing  so.  Try  to 
keep  in  mind  just  what  you  are  to  do;  master  the 
directions,  and  begin  as  quickly  as  possible." 

17.  "No.  13.  Guess,  when  you  are  not  sure  of  an  answer.  Don't 

try  to  answer  any  complete  test  by  guessing,  and 
guess  only  when  you  are  not  sure  of  the  answer  to 
any  question.  You  may  raise  your  score  more  by 
guessing  than  by  leaving  the  answer  blank.  A 
blank  is  just  as  bad  as  a  wrong  answer." 

18.  "No.  14.  If  your  pencil  breaks,  raise  your  hand  and  call 

'Pencil.'    Otherwise  you  will  not  speak." 

19.  "No.  15.  Now  go  into  this  test  as  you  would  into  a  foot  race 

or  into  a  basketball  game,  and  you  will  make  your 
best  score. 

"Ready:  Turn  over  to  Test  I.  Begin!" 

20.  Continue  the  schedule  as  directed  in  the  "Time  Administra- 
tion Sheet."  An  ordinary  watch  set  to  12  hours,  o  minutes, 
and  o  seconds  is  satisfactory  for  keeping  time  if  a  stop  watch  is 
not  available. 

The  examiner  must  not  give  any  further  directions  at  any  point. 
The  directions  are  a  part  of  the  working  time.  To  give  more 
directions  for  any  test  than  those  allowed  and  provided  for  in  the 
"Time  Administration  Sheet"  means  depriving  subjects  of  time 
needed  on  the  test.  The  examiner  should  take  about  2J  seconds 
to  give  the  direction,  "Turn  over  to  Test  4.  Begin!" 


Time  Administration  Sheet  of  the  I.E.R.  Clerical  Test  C-i 
At  the  end  of: 

o  M in.  say,  "  Ready!     Turn  over  to  Test  1    Begin." 

3  " 

4h  " 

8J  " 
I2h  " 
17*  " 

19*  " 
22  " 

30  2  " 

34  44 

37      "     "     "Stop!    Close  your  books, 
books." 

21.  Collect  the  papers  quickly. 


(3) 

2a.  Practice  page .  (i|) 

2    "  (4) 

3    44  (4) 

4    "  (5) 

5    "  (2) 

6    "  (2|) 

7    "  (5i) 

8    "  (3) 

9    "  (3*) 

10    "  (3) 

Be  sure  your  names  are  on  your 
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Directions  for  Scoring  the  io  Unit  Tests  of  I.E.R..  C-i 

Preliminary  Directions 

1.  a.  Check  all  papers  to  see  that  the  name  is  on  each  of  them. 

b.  Check  to  see  if  any  essential  information  of  the  question- 
naire on  the  first  page  is  lacking.  If  so,  supply  it,  or  else 
discard  the  paper. 

c.  Sort  and  classify  papers,  if  groups  or  classes  are  together. 
Tie  up  in  separate  bundles  with  class  labels  written  on  a 
paper  firmly  attached  to  each  bundle. 

2.  General  Procedure. 

a.  On  each  test  determine  the  last  question  attempted.  This 
means  the  last  full  question  completed. 

b.  Draw  a  line  under  the  question  number  of  the  last  question 
attempted. 

c.  Record  the  number  attempted  after  "  A  "  in  the  scoring  box, 
at  the  lower  right-hand  corner  of  the  page. 

N .  B .  The  final  question ' 1  attempted ' '  must  have  had  some 
reaction  made  to  it  as  indicated  by  a  mark  of  some  sort. 
If  only  a  part  of  a  question  is  attempted,  do  not  call  it  an 
attempt.  Example:  Subject  on  Test  fills  out  first  two 
digits  of  number  20  row.  The  number  of  attempts  is  thus 
19,  and  a  line  is  drawn  under  the  19-question  number,  and 
"19"  is  entered  after  "A"  of  the  scoring  box. 

The  attempts  are  always  the  maximum  number  of 
"Rights"  which  the  subject  could  have  received  on  his 
work.  Do  not  confuse  this  with  the  maximum  score  on 
the  test  given  by  the  work-limit  method. 

d.  Refer  to  the  special  scoring  directions  below  before  proceed- 
ing with  the  scoring  of  any  particular  test.  It  has  been 
found  economical  where  several  people  are  working  to- 
gether to  score  all  copies  of  a  single  test  at  one  time  before 
proceeding  to  a  second  test.  This  would  mean  that  it  is 
best,  if  you  have  ten  booklets,  for  example,  for  one  person 
to  score  all  ten  for  Test  1,  another  person  all  ten  for  Test  2, 
and  so  on. 

e.  Make  a  short  horizontal  dash  through  all  errors  and  omis- 
sions and  also  through  the  staggered  question  numbers  at 
the  right,  to  show  clearly  the  exact  location  of  all  errors  (W). 
Use  blue  or  red  pencils  to  mark  the  errors.  Make  a  short, 
horizontal  dash,  not  a  long,  sweeping  line. 

Do  not  mark  the  correct  answers  in  any  way. 

Omissions  count  just  the  same  as  errors,  save  in  Tests  1,9,  II, 
20,  and  28,  where  whole  columns  may  be  omitted  without  count- 
ing either  as  attempts  or  errors.  Miscellaneous  skipping  around 
will  be  severely  discounted  by  counting  all  omissions  as  errors. 

In  all  choice  tests  (true-false,  etc.)  two  or  more  underlined 
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answers  to  a  question,  indicating  uncertainty  or  choice,  count 
automatically  as  an  error. 

In  counting  errors  the  general  rule  is  to  count  as  wrong  any 
ambiguous  marking  or  any  case  where  there  is  reasonable  doubt  as 
to  the  answer  intended.  By  adhering  to  this  rule,  all  papers  are 
marked  uniformly  severely. 

In  case  two  answers  are  given,  the  score  is  an  error  unless  an  at- 
tempt has  obviously  been  made  to  erase  or  cross  out  one  of  them. 

Special  Scoring  Directions 

Test  i.1  Score  of  attempts  equals  the  serial  number  of  the  last 
addition  checked,  readily  determined  by  the  scoring  stencil  which 
has  the  attempts  numbered.    Maximum  Rights  =120. 

Count  as  A  the  number  of  attempts  shown  by  the  stencil; 
count  as  W  all  arithmetical  errors  and  omissions.  The  score  (R) 
is  A— W. 

N.  B.  The  above  directions  refer  to  the  printed  form.  In  the 
old  form  given  a  gross  score  weight  of  3  in  this  investigation,  the 
wrongs  only  were  marked ;  hence  the  gross  score  weight  in  the  new 
printed  Test  1  is  only  ij. 

Test  Q.  Score  of  attempts  is  the  serial  number  of  the  last 
price  coded.  Maximum  Rights  =  60.  Any  code  consistently 
used  throughout  the  test  may  be  allowed.  An  entirely  new  or 
original  code  used  may  be  allowed  if  consistently  and  correctly 
used.  One  or  more  errors  in  a  coded  price  makes  that  price  an 
error. 

Test  11.  Score  of  attempts  is  the  serial  number  of  the  last  line 
filled  out.  Maximum  Rights  =  50.  Numbers  must  be  correct  in 
every  digit.    No  partial  credits  allowed. 

Test  13.  Score  of  attempts  is  the  serial  number  of  the  last 
blank  filled  out  plus  any  omissions  which  have  occurred.  Maxi- 
mum Rights  =16.  Writing  the  name  of  the  fruit  instead  of  giving 
its  number  is  allowed.  If  both  numbers  and  names  are  given, 
score  according  to  the  numbers. 

Test  16.  Score  of  attempts  is  the  serial  number  of  the  last  line 
marked.  Maximum  Rights  =  30.  General  directions  apply. 
Writing  the  correct  word  is  allowed. 

Test  20.  Score  of  attempts  is  the  serial  number  of  the  last 
blank  filled  out.    Maximum  Rights  =  105. 

Test  24.  Score  of  attempts  is  the  serial  number  of  the  last  one 
filled  out.  Maximum  Rights  =  20.  Spelling  does  not  count  if 
the  word  is  not  ambiguous. 

Test  25.  Score  of  attempts  is  the  serial  number  of  the  last 
answer  underlined.  Maximum  Rights  =  30.  Two  underlinings 
to  one  question  count  as  an  error. 


*Test  numbers  are  the  Unit  Test  numbers  always  printed  just  to  the  left  of 
the  scoring  box. 
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Test  26.  Score  of  attempts  is  the  number  of  io's  which  might 
have  been  encircled  up  to  and  including  the  last  one  marked. 
Maximum  Rights  =  50.  There  are  from  2  to  5  attempts  in  each 
row. 

The  figure  at  the  right-hand  margin  of  each  row  keys  the  num- 
ber of  circles  which  should  be  in  that  line.  Check  by  inspection 
to  see  that  all  circles  are  around  a  "  10"  group  and  that  there  is 
the  proper  number  of  circles  in  each  row.  In  case  of  an  actual 
marked  error  draw  a  line  through  the  error  and  put  one  dash 
beside  the  number  at  the  right-hand  edge.  Make  a  similar  dash 
at  the  right-hand  edge  for  an  omission. 

The  cumulative  numbers  in  parentheses  at  the  left  are  for 
convenience  in  finding  attempts.  For  instance,  if  the  first  two 
io's  of  line  35  are  encircled,  the  attempts  are  37.  The  wrongs  are 
the  number  of  marks  placed  at  the  right-hand  edge.  No  stencil 
is  needed. 

Test  28.  Score  of  attempts  is  the  serial  number  of  the  last  pair 
of  parentheses  filled.  Maximum  Rights  =  40.  If  done  in  lines 
across  the  page  the  lower  groups  are  not  counted  as  omitted. 

I.E.R.  Test  C-2 
Directions  for  Administering 

Distribute  a  pencil,  test  blank,  and  directory  to  each  pupil. 
Have  pupils  fill  in  name,  address,  date,  age  and  school  grade  in 
the  proper  spaces  on  the  front  of  the  test  blank;  then  read  the 
following  directions: 

"When  I  say  'Begin,'  open  your  books  and  begin  on  Test  I. 
When  you  finish  Test  1,  go  right  on  to  Test  2  without  stopping, 
and  then  do  Test  3  without  stopping,  and  so  on.  Do  not  wait, 
but  go  right  on  from  one  test  to  the  next.  At  certain  times  I  shall 
tell  you  to  begin  on  a  new  test  even  if  you  have  not  already  begun 
it.  Keep  at  work  all  the  time.  The  words  at  the  top  of  each  test 
tell  you  what  you  are  to  do.  You  are  to  go  ahead  as  fast  as  you 
can,  working  accurately. 

"  Ready!    Turn  over  to  Test  1.  Begin!" 

Cumulative  Time  Administration  Sheet 

At  the  end  of  o  min.  say,  "Ready!    Turn  over  to  Test  1.    Begin!"  (2) 
"    "    2      "     "    "Even  if  you  haven't  finished  Test  1,  begin  now 
on  Test  2— Test  2."  (2) 
"    "    4     "     "    "Even  if  you  haven't  finished  Test  2,  begin  now  on 
Test  3.— Test  3."  (3) 
"    "    7     "     "    "Even  if  you  haven't  finished  Test  3,  begin  now  on 
Test  4— Test  4."  (3) 
"    "10     "     "    "  Even  if  you  haven't  finished  Test  4,  begin  now  on 
Test  6.    There  is  no  Test  5."  (6) 
"    "  16     "     "    "Stop!    Close  your  books." 
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The  Scoring  of  I.E.R.  Test  C-2 
On  Test  1  (underscoring  of  letters),  count  as  attempts  the  num- 
ber of  letters  that  should  have  been  underlined  up  to  and  including 
the  last  letter  underlined.  Give  as  the  score  the  number  of  at- 
tempts minus  errors  (actual  errors  and  omissions)  up  to  that 
point. 

On  Test  2  (underlining  one  digit),  if  subject  has  worked  in 
columns,  count  as  attempts  the  number  of  digits  that  should  have 
been  underlined  in  the  columns  up  to  and  including  the  last  digit 
underlined.  Give  as  the  score  the  number  of  attempts  minus  er- 
rors (actual  errors  and  omissions)  up  to  this  point.  If  subject  has 
worked  in  rows,  the  same  method  is  followed  except  to  count  as 
attempts  the  digits  that  should  have  been  underlined,  considered 
by  rows  instead  of  by  columns. 

On  Test  3  (underlining  groups  of  digits),  if  subject  has  worked  in 
columns,  count  as  attempts  the  number  of  groups  of  digits  that 
should  have  been  underlined  in  the  columns  up  to  and  including 
the  last  group  underlined.  Give  as  the  score  the  number  of  at- 
tempts minus  errors  (actual  errors  and  omissions)  up  to  this 
point.  If  subject  has  worked  in  rows,  the  same  method  is  fol- 
lowed except  to  count  as  attempts  the  groups  which  should  have 
been  underlined,  considered  by  rows  instead  of  by  columns. 

On  Test  4  (same  and  different  numbers),  if  subject  has  worked 
in  columns,  count  as  attempts  the  number  of  pairs  that  should 
have  been  underlined  in  the  columns  up  to  and  including  the  last 
pair  underlined.  Give  as  the  score  the  number  of  attempts 
minus  errors  (actual  errors  and  omissions)  up  to  this  point.  If 
subject  has  worked  in  rows,  the  same  method  is  followed  except  to 
count  as  attempts  the  pairs  that  should  have  been  underlined, 
considered  by  rows  instead  of  by  columns. 

On  Test  6  (copying  addresses)  count  as  attempts  the  last  ad- 
dress copied  and  give  as  the  score  the  number  of  attempts  minus 
errors  (actual  errors  and  omissions)  up  to  that  point.  (The  two 
addresses  given  at  the  top  as  samples  are  not  included  in  either  at- 
tempts or  rights.)  If  home  addresses  are  given  consistently 
throughout  instead  of  New  York  City  addresses,  as  required,  give 
as  final  score  90  per  cent  of  the  number  of  home  addresses  cor- 
rectly given,  most  easily  determined  by  deducting  10  per  cent, 
recording  result  to  nearest  whole  number.  If  both  New  York 
City  addresses  and  home  addresses  are  given,  compute  the  score 
on  the  basis  of  the  New  York  City  addresses  only.1 

Weighting  the  Tests :  (Add  the  scores  on  Tests  1 ,  2  and  3.  Mul- 
tiply the  gross  scores  on  Test  4  by  4.  Multiply  the  gross  scores 
on  Test  6  by  10.    Add  the  three  quantities  together  for  the  sub- 


1  This  is  in  accord  with  our  general  scoring  principle  that  if  the  subject  does  more 
than  is  required,  and  so  penalizes  himself  by  using  up  the  time  allotted  for  the  test, 
he  is  not  penalized  for  including  the  additional  parts  of  his  answer. 
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ject's  weighted  score  on  Test  C-2. 
used,  see  Chapter  V,  pp.  96-97.) 


For  the  derivation  of  weights 


Stenquist  Mechanical  Test 
Directions  for  Administering 

The  Series  I  of  the  Stenquist  Assembly  Test  of  Mechanical 
Ability  was  used  in  this  investigation.  This  series  consists  of  the 
following  ten  articles:  Cupboard  catch,  clothes  pin,  Hunt  paper 
clip,  chain,  bicycle  bell,  shut-off,  wire  stopper,  push  button,  lock 
No.  1,  and  mouse  trap.  This  series  is  for  sale  by  the  C.  H.  Stoelt- 
ing  Co.,  Chicago,  Illinois.  The  procedure  followed  in  administer- 
ing the  test  was  that  recommended  by  Dr.  Stenquist  in  a  manu- 
script entitled  "  Stenquist  Assembly  Test  of  General  Mechanical 
Ability,  Description  and  Manual  of  Directions,"  issued  from  the 
Board  of  Education  of  the  City  of  New  York.  The  scoring  sheet 
therein  recommended  was  slightly  altered  in  order  to  make  it 
more  useful  in  quickly  recording  the  scores  secured  by  the  subject. 

The  revised  directions  for  the  use  of  these  tests  are  now  avail- 
able in  a  booklet  entitled  "Stenquist  Assembling  Test  of  General 
Mechanical  Ability,  Description  and  Manual  of  Directions," 
published  by  the  C.  H.  Stocking  Co.,  Chicago,  Illinois.  The 
administrative  directions  used  in  this  investigation  are  as  follows : 

1.  Distribute  one  Scoring  Sheet  and  one  box,  with  hinges  toward 
the  pupil,  to  each  pupil.    Caution  the  pupils  as  follows: 

"  Do  not  open  the  boxes  until  I  tell  you  to  do  so." 

2.  "Now  write  your  name  at  the  top  of  the  page  where  it  says 
'name,'  and  then  fill  in  the  other  blanks."  (Examiner  walks 
around  room  explaining  how  to  do  it  wherever  necessary.) 
"  Now,  fold  your  papers  once  lengthwise  (illustrating)  like  this, 
and  put  the  paper  under  your  box." 

3.  "Look  at  the  directions  on  the  lid  of  the  box  while  I  read 
them.  In  this  box  there  are  some  common  mechanical  things 
that  have  all  been  taken  apart.  You  are  to  take  the  parts  and 
put  them  together  as  they  ought  to  be;  that  is,  you  are  to  take 
the  parts  and  put  them  together  so  that  each  thing  will  work 
perfectly. 

"Do  not  copy  what  your  neighbor  is  doing,  but  work  ab- 
solutely by  yourself.  Keep  the  box  turned  so  that  the  hinges 
are  toward  you.  When  opened  in  this  position  the  cover  forms 
a  tray  in  which  to  work. 

"Do  not  break  the  parts.  Everything  goes  together  easily 
if  you  do  it  in  the  right  way.    Begin  with  Model  A;  then  take 
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Model  B;  then  C;  and  so  on.  If  you  come  to  one  that  you 
cannot  do  in  about  three  minutes,  go  on  to  the  next.  The 
person  who  gets  the  most  things  right  gets  the  highest  score. 

"Ready:  Begin!"    (Examiner  sets  watch  to  12  hours,  o 
minutes,  o  seconds.) 

4.  After  3  minutes,  say: 

"Do  not  spend  more  than  about  3  minutes  on  any  one 
model." 

5.  At  the  end  of  30  minutes,  say: 

"Stop!    Put  your  paper  with  your  name  on  it  into  the  box 
and  close  the  box." 

Scoring  the  Stenquist  Assembly  Test 
At  the  outset  the  Stenquist  Assembly  Test  was  scored  on  two 
bases:  the  first  was  the  partial  scoring  method,  using  a  mimeo- 
graphed scoring  form  and  scores  recommended  by  Dr.  Stenquist; 
the  second  was  on  the  all-or-none  basis,  credit  being  given  only  for 
a  perfect  performance  of  a  given  model.  It  soon  became  quite 
evident  that  for  public  school  boys  and  girls  of  the  ages  13-15 
inclusive,  the  all-or-none  basis  would  result  in  a  great  number  of 
zero  scores  which  would  undoubtedly  decrease  the  validity  of  the 
test  and  possibly  spuriously  affect  its  reliability.  Accordingly, 
the  all-or-none  basis  was  discarded.  All  correlations  of  the 
Stenquist  Assembly  Test  herein  reported  are  for  the  raw  scores 
determined  from  the  partial  scoring  method  using  the  mimeo- 
graphed form  of  the  scoring  blank.  This  mimeographed  form  has 
been  slightly  altered  for  ease  in  scoring  and  has  been  printed  by 
the  Institute  of  Educational  Research  under  the  title,  "Stenquist 
Mechanical  Assembly  Scoring  Sheet." 

Suggestions  for  Scoring  the  Stenquist  Assembly  Test  While 
Administering  Paper  Tests 

Our  procedure  in  the  administration  of  these  tests  may  be 
interesting  to  anyone  who  wishes  to  give  several  scales  for  voca- 
tional guidance  purposes  to  the  same  subjects.  It  has  been  found 
quite  possible  for  two  examiners,  using  the  printed  scoring  sheet, 
to  score  approximately  30  boxes  in  30  minutes.  Then  in  the 
interval  of  about  10  minutes  required  for  dismissal  of  the  first 
test  group  and  entrance  of  the  second,  the  two  examiners  may  dis- 
assemble the  scored  models  ready  for  the  second  group.  This 
procedure  would  enable  a  continuous  succession  of  tests  of  30 
subjects  each  to  be  made  every  40  minutes  throughout  the  day. 
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However,  this  would  be  rather  strenuous  work  for  the  two  ex- 
aminers, and  it  has  been  found  possible  to  combine  other  tests 
with  the  Assembly  Test  to  good  advantage.  The  procedure  fol- 
lowed is:  the  subjects  during  the  first  30  minutes  take  the  Sten- 
quist  Assembly  Test,  followed  the  next  30  minutes  by  the  Thorn- 
dike-McCall  Reading  Test,  which  is  a  test  involving  no  additional 
attention  from  the  examiner  once  the  directions  are  given  and  the 
subjects  have  begun  on  the  test.  While  the  subjects  are  working 
on  the  Thorndike-McCall  Reading  Test,  the  examiners  are  busy 
scoring  the  Stenquist  Assembly  Test.  The  Thorndike-McCall  is 
then  followed  by  other  tests  to  be  given  in  the  general  series. 
There  is  an  additional  advantage  to  be  secured  by  giving  all  tests 
at  one  sitting,  that  of  insuring  that  all  subjects  will  have  test 
scores  complete  on  all  the  different  tests,  a  feat  which  is  quite 
impossible  if  the  different  tests  are  given  on  different  days.  The 
only  restriction  here  seems  to  be  that  one  should  not  have  so  many 
tests  on  one  day  that  fatigue  will  enter.  With  pupils  of  ages  13, 
14  and  15,  two  hours  testing  does  not  seem  excessive  provided 
breathing  exercises  and  a  short  rest  interval  are  given  at  the  end 
of  the  first  hour.  This  was  done  in  all  cases  where  our  tests  were 
given  for  more  than  one  hour  at  a  time. 

Stenquist  Mechanical  Aptitude  Tests 
These  tests  are  paper-and-pencil  tests  devised  to  be  given  to 
subjects  in  groups  and  to  be  used  either  in  the  absence  of  the 
Stenquist  Assembly  Test  or  to  supplement  the  Assembly  Test. 
They  are  sold  by  the  World  Book  Co.,  Yonkers,  New  York. 

Both  Picture  Tests  I  and  II  were  given  to  all  boys  in  Public 
School  B  who  were  13  years  of  age  or  over.  Inasmuch  as  the 
standard  directions  had  not  yet  appeared  at  the  time  the  tests 
were  given,  the  directions  used  in  administering  the  tests  varied 
slightly  from  those  recommended  in  the  test  manual  now  pub- 
lished by  the  World  Book  Co.  The  directions  were  slowly  and 
distinctly  read  to  the  subjects  allowing  plenty  of  time  for  them  to 
note  the  picture  parts  referred  to  in  the  directions.  The  individual 
coaching  of  subjects  who  failed  to  grasp  the  idea  of  the  test,  recom- 
mended by  Dr.  Stenquist,  was  not  followed.  Such  a  subject  was 
merely  urged  to  "figure  it  out"  for  himself. 

A  time  limit  of  30  minutes  on  each  form  was  strictly  adhered  to. 
Few  subjects  of  the  ages  13  to  15  are  unable  to  finish  in  this  time, 
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while  the  majority  finish  rather  earlier.  A  shorter  time  limit 
would  undoubtedly  secure  as  good  results.  Considerable  dif- 
ficulty was  experienced  in  Test  I  by  reason  of  the  reversal  of  the 
pictures  for  the  later  exercises.  The  subjects  are  extremely  prone 
to  turn  the  paper  around  and  begin  on  exercise  6  instead  of  exer- 
cise 1,  even  though  specially  cautioned,  "Begin  on  exercise  1,  and 
not  on  exercise  6." 

The  Weighting  of  Stenquist  Picture  I  and  II 
It  seemed  desirable  to  combine  the  scores  on  Forms  I  and  II 
of  the  Stenquist  Mechanical  Picture  Test  in  order  to  save  time  in 
the  computations.  The  distribution  of  test  scores  on  each  of  the 
two  tests  was  tabulated  for  the  467  pupils  in  Public  School  B  who 
had  taken  both  of  these  tests,  and  yielded  the  standard  deviations, 
0-7  =  13.50  and  077  =  10.83.  If>  then,  Stenquist  I  raw-scores  be 
weighted  4  and  Stenquist  II  raw-scores  be  weighted  5,  the  two 
tests  will  be  given  almost  identical  partial  regression  true  im- 
portances.   The  raw-scores,  rather  than  the  T-scores,  were  used. 

The  Girls'  I.E.R.  Assembly  Test 
Directions  for  Administering  the  Girls'  I.E.R.  Assembly  Test 
The  test  may  be  given  to  groups  of  any  size,  the  size  being 
limited  only  by  the  number  of  sets  of  tests  available.  Each  pupil 
preferably  should  have  a  desk  top  or  three  feet  of  horizontal  space 
on  a  table,  upon  which  to  work.  Separate  desks  are  preferable  for 
the  reason  that  the  pupil  is  less  tempted  to  watch  his  neighbor  at 
work. 

1.  Distribute  one  test  box  and  one  scoring  sheet  to  each  pupil. 
Have  pupils  fill  out  the  information  blanks  at  the  top  of  the 
scoring  sheet,  and  then  fold  the  scoring  sheet  lengthwise  through 
the  middle  and  place  beneath  the  box  out  of  the  way. 

2.  Instruct  the  subjects  as  follows:  "When  you  open  your 
boxes  (wait  for  the  signal),  you  will  find  a  pair  of  scissors,  a  little 
box  and  some  envelopes.  First,  open  the  little  box,  which  has  a 
big  letter  A  on  it,  and  string  the  beads  just  like  the  sample.  Then 
open  the  envelope  marked  B  and  put  the  parts  together  so  that 
they  will  look  just  like  the  sample.  Then  take  the  parts  in  en- 
velope C  and  put  them  together  so  that  they  will  look  just  like  the 
sample.  Then  do  D,  E,  and  F  and  so  on.  Do  not  spend  too  long 
on  any  one  package  but  work  as  fast  as  you  can  and  do  your  work 
neatly.    Ready!    Open  your  boxes;  find  your  scissors;  and  begin 
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on  box  A.  String  the  beads  so  that  they  will  look  just  like  the 
sample." 

At  this  point  set  your  watch  at  12  hours,  o  minutes,  o  seconds. 
Allow  exactly  45  minutes,  and  give  the  signal:  "Stop!  Put  all 
your  parts,  together  with  the  scraps,  back  into  the  big  box  and 
close  your  box  quickly." 

Scoring  the  I.E.R.  Girls  Assembly  Test 

Score  the  test  according  to  the  standard  scoring  sheet  which 
may  be  secured  from  the  Institute  of  Educational  Research, 
Teachers  College.  In  pencil  draw  a  straight  line  from  the  credit 
given  out  to  the  margin  and  write  in  the  margin  the  credit  al- 
lowed. These  are  later  added  up  and  written  at  the  head  of  the 
sheet  on  the  blank  line  after  the  word  "Score."  The  following 
comments,  in  regard  to  credits  that  may  not  be  clear  from  the 
reading  of  the  scoring  sheet,  will  be  helpful. 

A.  Stringing  Beads.  If  the  subject's  model  is  all  O.K.,  give 
him  a  credit  of  10,  and  disregard  the  partial  scoring.  If  not  O.K., 
give  partial  credits  according  to  the  scoring  sheet.  (The  partial 
credits,  which  add  up  to  10,  all  align  in  a  given  column.)  Both 
loops  must  be  made  to  get  credit  for  loops.  No  credit  for  one  loop 
only.  A  bow  knot  receives  zero  on  the  partial  credit  "properly 
tied  on  card." 

B.  Inserting  Tape.  All  O.K.,  give  10.  The  credits  of  8  and  2 
are  over-all  performances  and  therefore  occur  on  the  printed  page 
in  alignment  with  the  10  and  not  with  the  partial  credits.  Thus 
8,  2,  and  o  are  the  only  credits  allowable.  Two  holes  or  more 
incorrect  receives  o. 

C.  Rosette.  All  O.K.,  10.  Bow  knot,  if  otherwise  O.K., 
receives  a  credit  of  8. 

D.  Cross  Stitch.  All  O.K.,  10.  "Corner"  means  whether 
corner  has  been  turned  exactly  like  sample.  If  the  corner  is  not 
turned  correctly,  this  means  that  the  subject  began  the  sample  at 
the  wrong  corner.  Consequently  this  is  a  serious  error.  Neat- 
ness may  be  given  a  subjective  score  of  either  o,  1,2,  or  3,  accord- 
ing to  the  scoring  sheet  schedule,  which  shows  what  credits  are  to 
be  deducted  for  puckering,  for  stitches  not  in  the  corner  of  the 
squares,  and  for  the  thread  being  too  loose. 

E.  Key  Ring.  All  O.K.,  10.  This  is  one  of  the  most  readily 
scorable  of  all  the  tests.    Follow  the  scoring  sheet. 

F.  Clip  Chain.  All  O.K.,  10.  "Joined  singly"  means  only 
one  wire  hooked  in  each  case,  but  all  in  a  straight  line.  If  the 
double  wires  are  hooked,  give  the  credit,  if  any,  which  would  be 
appropriate  for  the  number  of  links  correctly  made.  Deduct  no 
credit  for  the  chain  not  being  hooked  on  the  card. 
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G.  Tape  Sewing.  All  O.K.,  10.  Pull  the  tape  gently  at  all 
points;  if  the  thread  pulls  out  in  even  a  single  stitch  leaving  the 
tape  loose,  the  first  four  points  credit  are  lost.  The  remaining 
judgments  are  all  quality  judgments  and  are  scored  as  per  the 
scoring  sheet. 

H.  Trunk  Tag.  The  credit  of  5  is  given  for  the  difficult  feat  of 
getting  the  lower  strap  through  the  buckles  (that  is,  the  hole 
which  is  in  the  middle  of  the  strap  being  correctly  placed  on  the 
tongue).  When  held  in  the  left  hand  with  card  up  and  buckle  at 
tips  of  the  fingers,  the  card  must  be  in  a  position  to  read. 

I.  Card  Wrapping.  All  O.K.,  10.  If  any  cross  has  \  of  the 
cross,  or  more,  inside  or  touching  the  pencilled  lines  count  as 
"  inside."  Give  one  point  each  for  each  of  the  crosses  correctly 
made  within  the  lines.  The  crosses  must  be  made  with  the  proper 
kind  of  lapping. 

J.  Booklet.  All  O.K.,  10.  All  the  scoring  points  in  this  test 
are  subjective  points,  but  the  test  is  readily  scorable. 

K.  Trimming  Paper.  If  cut  between  the  lines,  give  one  point 
each  for  each  of  the  intervals  between  two  adjoining  pairs  of  num- 
bers that  are  passed  without  touching  either  of  the  boundary  lines. 
One  may  get  credit  on  the  first  interval,  miss  any  number  of  suc- 
ceeding ones,  and  finally  get  other  credits  on  more  difficult 
intervals. 

If  the  subject  attempts  to  trace  one  of  the  two  printed  lines, 
then  use  the  qualitative  judgments  provided  on  the  scoring  sheet. 

In  the  entire  scale,  give  no  credit  for  "partially  correct "  scoring 
items,  except  in  the  neatness  judgments  which  are  indicated  on  the 
scoring  key  in  serial  fashion,  "o,  1  or  2."  Each  item,  except  the 
neatness  judgments,  is  to  be  scored  on  the  all-or-none  basis. 

Directions  for  Assembling  the  I.E.R.  Girls'  Assembly  Test 

The  standard  specifications  for  the  I.E.R.  Girls'  Assembly  Test 
are  as  follows: 

Needleworkers'  scissors,  4 J  inch,  one  blade  pointed,  the  other 
slightly  blunt. 

A.  Stringing  Beads.  Model  consists  of  24  beads  of  red,  blue, 
and  yellow,  alternating  in  that  order  by  fours,  which  are  to  be 
strung  on  a  pink  cord  19!  inches  long.  The  cord  is  lapped  once 
back  through  the  two  end  beads  and  is  then  run  through  holes  in 
the  card  and  tied  in  a  hard  knot  behind.  The  cardboard  is  made 
of  heavy  binder's  board  2\  by  3!  inches  with  f-inch  holes  centered 
\  inch  from  the  ends.  The  box  to  contain  the  model  and  materials 
is  a  sliding  pasteboard  box,  2\  by  3!  by  \  \  inches,  inside  measure- 
ments. To  assemble,  a  pink  cord  19-J-  inches  long  is  threaded  into 
a  No.  1  darning  needle  and  is  wound  once  around  a  blank  punched 
card ;  24  beads  are  counted  and  checked  to  see  that  they  contain  8 
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of  a  color  and  are  placed  in  the  bottom  of  the  box.  On  top  with 
cardboard  down  is  placed  the  model,  coiling  the  strung  beads  in  a 
rope;  then  above  the  model  is  placed  the  punched  card  with 
threaded  needle,  the  needle  being  underneath  and  the  box  is  then 
closed  with  the  letter  A  on  top.  After  scoring,  clip  the  cord  on 
the  subject's  product,  disassemble,  recount  and  recheck  the  beads, 
rethread  the  needle  and  reassemble.  The  cord  is  used  up  and 
must  be  supplied  from  the  storeroom. 

B.  Inserting  Tape.  The  model  is  constructed  of  one  piece  of 
white  beading,  standard  pattern,  15  inches  long,  in  which  is 
inserted  one  piece  of  white  tape  15  inches  long  beveled  to  450  at 
one  end,  the  tape  being  held  in  place  by  a  metal  eyelet  in  each  end. 
To  assemble,  place  one  model,  one  piece  of  beading  and  one  piece 
of  tape,  separated,  in  envelope  B,  folding  all  together  two  or  three 
times  in  order  to  prevent  undue  rumpling.  After  scoring,  disas- 
semble and  place  back  in  envelope. 

C.  Rosette.  Model  is  constructed  of  one  stiff  cardboard  if  inch 
square  punched  with  8  holes  symmetrically  arranged  around  the 
center  in  a  circle  1  inch  in  diameter,  threaded  in  rosette  design 
with  one  pink  cord  15  inches  long,  tied  in  hard  knot.  To  assem- 
ble, place  model,  one  unused  cardboard  and  one  pink  cord  15 
inches  long,  in  envelope  C.  After  scoring,  cut  the  cord  and  use 
cardboard  over  again.    Cord  must  be  replaced. 

D.  Cross  Stitch.  Model  is  made  up  of  one  square  of  lavender 
checked  gingham,  cut  with  a  white  border,  four  dark  lavender  rows 
of  squares  each  way,  sewed  with  standard  cross  stitch  design.  To 
assemble,  thread  one  No.  5  needle  with  a  15-inch  No.  16  black 
sewing  thread  and  knot  one  strand  of  the  thread ;  wind  threaded 
needle  about  the  small  cardboard  to  keep  it  from  tangling;  place 
threaded  needle,  model,  and  blank  piece  of  gingham  in  envelope 
D.  After  scoring,  destroy  the  subject's  performance,  rethread 
needle,  replace  gingham  and  reassemble.  Both  the  gingham  and 
the  thread  are  used  up  and  must  be  replaced. 

E.  Key  Ring.  Model  consists  of  one  key  ring  and  key  properly 
assembled  on  a  heavy  piece  of  cardboard  6J  inches  by  3^.  To 
assemble,  place  model  and  the  four  separate  parts  in  envelope  E. 
After  scoring,  disassemble  and  place  again  in  envelope. 

F.  Clip  Chain.  Model  consists  of  six  No.  1  Gem  wire  paper 
clips,  assembled  in  standard  fashion  and  fastened  to  dress  hook 
supports  attached  to  heavy  binder's  board  card  6i  inches  by  3J. 
To  assemble,  place  model,  one  blank  card,  and  six  separated  clips 
in  envelope  F.  After  scoring,  disassemble  and  place  again  in 
envelope. 

G.  Tape  Sewing.  Model  consists  of  one  6j-inch  by  3§-inch 
piece  of  white  muslin,  bound  with  white  tape  J  inch  wide  of  the 
same  length,  sewed  on  with  No.  16  black  sewing  thread.  To 
assemble,  place  one  model,  one  piece  of  muslin,  one  piece  of  tape, 
one  threaded  and  knotted  No.  5  needle,  wrapped  on  card  as  in  D 
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above,  in  envelope  G.  After  scoring,  carefully  trim  off  the  taped 
edge  and  use  the  muslin  in  two  or  three  further  administrations  of 
the  test;  rethread  the  needle,  and  reassemble  into  envelope.  The 
tape  and  thread  are  used  up  each  time,  and  the  muslin  also  after 
about  four  administrations  of  the  test. 

H.  Trunk  Tag.  Model  consists  of  one  trunk  tag  assembled 
according  to  standard  specifications,  and  fastened  to  heavy 
binder's  board  6|  inches  by  $i  inches.  To  assemble,  place  one 
model  and  the  five  separate  parts  in  envelope  H.  After  scoring, 
disassemble  and  replace  in  envelope. 

i\  Card  Wrapping.  Model  consists  of  one  stiff  binder's  board 
6i  by  2>i  inches  with  8  semi-circular  holes  along  each  side  sym- 
metrically placed  according  to  a  standard  templet,  wound  in 
standard  design  with  two  pink  cords  each  34  inches  long  and 
knotted  in  a  bowknot  at  each  end.  The  cords  supplied  to  the  sub- 
ject are  each  34  inches  long,  and  knotted  together  with  a  hard 
knot  at  3J  inches  from  one  end.  To  assemble,  place  one  model, 
one  blank  card,  and  one  pair  of  cords  in  envelope  I,  coiling  the 
cords  by  winding  them  around  the  hand  in  order  to  insert  them 
easier  in  the  envelope.  After  scoring,  either  untie  the  subject's 
performance,  or  clip  the  cord  and  replace  it  in  case  the  knot  is  too 
firmly  tied.    The  cord  is  ordinarily  used  up. 

/.  Booklet.  Model  consists  of  pasted  booklet  constructed  from 
two  manila  cards  2f  by  3 J  inches,  hinges  made  from  a  one-inch 
square  mucilaged  blue  paper,  cut  in  three  equal  parts.  To  as- 
semble, place  in  envelope  J  one  model,  one  manila  card  6 J  inches 
by  3h  and  one  piece  of  blue  mucilaged  paper  one  inch  square. 
After  scoring,  destroy  subject's  performance,  and  replace  manila 
card  and  mucilaged  square.  The  working  material  is  all  used  up 
by  the  subject. 

K.  Trimming  Paper.  Model  consists  of  one  cut-out  inside  part 
of  standard  cut-out  design.  To  assemble,  place  one  model  and 
one  blank  cut-out  sheet  in  envelope.  After  scoring,  destroy  sub- 
ject's performance  and  replace.    The  blank  cut-out  is  used  up. 

From  the  above  it  will  be  seen  that  for  each  test  subject,  in 
case  he  attempts  every  model,  there  will  need  to  be  supplied  anew 
the  following  materials  which  are  used  up  with  each  administra- 
tion: 

1  pink  cord  19 J  inches  long,  used  in  the  bead  stringing  model. 
1  pink  cord  15  inches  long,  used  in  the  rosette  model. 

1  piece  of  gingham,  used  in  the  cross  stitch  design. 

2  No.  5  threaded  needles,  threaded  with  15  inches  of  No.  16  black 
sewing  thread,  used  in  the  cross  stitch  and  tape-sewing  models. 

1  piece  of  muslin  3^  by  6J  inches,  used  in  the  tape  sewing  model. 

I  tape  \  by  6}  inches,  used  in  tape  sewing  model. 

1  pair  of  34-inch  pink  cords,  used  in  card  wrapping  model. 
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1  manila  card  6\  by  2>i  inches,  used  in  booklet. 

i  inch  square  of  blue  mucilaged  paper,  used  in  booklet. 

I  printed  cut-out  design,  used  in  trimming  paper  model. 

Since  these  must  be  replaced  after  each  administration  of  the 
test,  it  is  desirable  to  assemble  large  quantities  of  these  supply 
materials  in  the  storeroom  in  order  that  testing  may  not  be  delayed 
by  the  necessity  of  spending  time  upon  preparation  of  supply 
materials.  After  all  the  envelopes  have  been  assembled,  arrange 
them  in  order,  B  to  K  inclusive,  and  place  one  of  the  boxes,  No.  A, 
and  one  each  of  the  envelopes  and  a  pair  of  scissors  in  each  of  the 
large  work  boxes.    This  completes  a  test  outfit  for  one  test  subject. 

Thurstone  Manual  Training  Test 

In  order  to  save  time  in  its  administration,  and  since  the  boys' 
course  in  wood  working  had  not  covered  instructions  in  me- 
chanical drawing  nor  in  wood  finishing  nor  the  finer  points  in 
cabinet  making,  twenty  of  the  questions  of  the  Thurstone  Manual 
Training  Test  were  eliminated  before  mimeographing  the  test. 
The  questions  eliminated  from  the  original  Thurstone  Test  are 
numbers  I,  2,  5,  6,  7,  9,  12,  21,  23,  number  29  (for  which  was  sub- 
stituted the  statement  "A  No.  8  wood  bit  makes  a  hole  J  inch  in 
diameter"),  45,  46,  48,  50,  51,  54,  62,  64,  66,  73,  81,  88,  93.  The 
test  was  then  mimeographed  and  administered  with  a  time  limit 
of  20  minutes  (which  was  more  than  ample)  and  with  the  fol- 
lowing revised  directions: 

"Some  of  the  statements  below  are  true  and  some  are  false. 
Read  each  statement.  If  the  statement  is  true,  draw  a  line  under 
'TRUE';  but  if  the  statement  is  false  then  draw  a  line  under 
'  FALSE.'  If  you  are  not  sure,  guess.  It  is  better  to  guess  than 
to  leave  out  a  statement." 

f  A  knife  is  used  to  cut  steel.  True.  False. 

Samples:  \  The  hammer  is  used  to  drive  nails.      True.  False. 

Screws  should  be  driven  with  a  hammer.  True.  False. 

Evaluation  of  the  Occupations  of  Fathers  of  Test 

Subjects 

The  occupations  of  the  fathers  of  the  test  subjects  were  given 
credits  according  to  the  following  scale: 
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OCCUPATION  GROUP 

Unskilled  or  laborer 
Agricultural   


CREDIT 


1 

2 


Skilled  trades 


3 
4 
5 


Business  or  clerical 
Professional   


These  values  are  used  in  determining  for  different  groups  the 
average  occupation  of  father.  It  is  not  claimed  for  this  scale  that 
the  intervals  between  steps  are  equal;  but,  considering  them  so  in 
the  statistical  evaluation  of  occupation,  one  secures  certain 
significant  statistical  results;  the  results  would  be  different  if  a 
different  scale  were  used.  If,  however,  one  may  secure  results  of 
value  with  this  rough  scale,  he  can  secure  still  better  results  with  a 
better  scale,  more  objectively  worked  out  and  with  the  scale 
intervals  more  nearly  equal.  The  scale  is  readily  applied,  and 
has  a  high  scoring  reliability. 

The  Role  of  Probability  in  Making  Individual 
Recommendations 

The  recommendations  to  a  pupil  who  has  taken  the  tests  should 
be  couched  in  terms  of  probability  determined  from  a  correlation 
plot  or  "scattergram "  between  the  criterion  and  the  composite 
test  score.  Kitson 1  gives  a  concrete  illustration  of  the  necessity  of 
this  by  analogy  from  the  insurance  business,  quoted  below: 

"A  man  of  thirty  years  inquires  of  an  insurance  company  if  he 
will  live  to  the  age  of  seventy.  Actuarians  have  studied  thou- 
sands of  cases  and  have  discovered  that  out  of  every  thousand 
men  who  are  sound  at  thirty,  a  fairly  constant  number,  say  one 
hundred,  become  septuagenarians.  The  company  physician 
tests  this  man  and  finds  him  sound.  But  it  does  not  tell  him: 
'Yes,  you  will  live  to  the  age  of  seventy.'  For  although  one 
hundred  in  every  thousand  thirty-year-old  sound  men  achieve  the 
septuagenary,  this  man  may  be  one  of  the  nine  hundred  who  die 
at  an  earlier  age.  Accordingly  the  physician  states  the  man's 
longevity  in  terms  of  probability  saying:  'You  have  one  chance 
in  ten  of  living  to  the  age  of  seventy.'  And  to  show  the  strength 
of  its  conviction,  the  company  is  willing  to  wager  a  specified  sum 
with  the  applicant." 


1  Kitson,  H.  D.  "Vocational  Guidance  and  the  Theory  of  Probability,"  School 
Review,  Vol.  23,  No.  2,  1920,  pp.  143-50. 
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It  should  be  obvious  that,  the  higher  the  correlation  of  the  com- 
posite tests  with  the  criterion,  the  smaller  the  test  score  limits 
within  which  we  shall  be  able  to  place  an  individual's  success. 
By  a  manipulation  of  the  standard  error  of  estimate  we  can  arrive 
at  the  probability  recommendations  which  are  more  or  less 
independent  of  the  frequencies  which  occur  in  any  array  of  the 
table.  The  standard  error  of  estimate  is  a  function  of  the  correla- 
tion coefficient  rather  than  of  the  separate  arrays,  and  so  is  less 
likely  to  vary  than  the  arrays  of  the  "  scattergram  "  which  shows 
the  relationship  of  criterion  scores  to  composite  test  scores. 

Materials  Needed  for  Vocational  Guidance  Tests  1 
Stenquist  Boxes 

Stenquist  Hospital  box  (supplies) 

Stenquist  Pliers 

Stenquist  Scoring  Sheets 

I.E.R.  Girls'  Assembly  Boxes 

I.E.R.  Girls'  Assembly  Scoring  Sheets 

I.E.R.  Clerical  C-i  Blanks 

I.E.R.  C-2  Blanks 

I.E.R.  C-2  Directories 

Thorndike  Arithmetic  Form  C  Blanks 

Thorndike-McCall  Reading  Form  8  Blanks 

Reliable  watch  with  second  hand  (or  stop-watch) 

Pencils 

Administrative  Directions 


1  The  test  materials  of  this  list  may  be  obtained  from  the  Bureau  of  Publications, 
Teachers  College,  New  York  City. 
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THE  CONSTRUCTION  OF  CRITERIA 

The  Stenographic  Criterion 

In  order  to  obtain  a  very  reliable  estimate  of  the  pupils' 
abilities  to  progress  in  the  acquirement  of  stenography,  the  follow- 
ing sources  of  independent  information  about  each  pupil  were 
obtained : 

I.  Lines  of  shorthand  taken  down  in  8  min.  15  sec.  by  the  pupil  on  a  formal 
performance  test,  Army  Test  762-A.  This  test  required  the  pupil  to  copy 
in  shorthand  notes  an  extract  from  a  mimeographed  copy,  the  notes  being 
immediately  transcribed  on  the  typewriter.  The  competitive  element 
was  present,  as  "the  first  to  finish  "  was  allowed  to  set  the  time  of  the  test; 
the  fastest  pupil  finished  the  test  in  8  min.  15  sec,  after  which  the  re- 
mainder of  the  class  completed  their  notes  after  marking  the  place  at 
which  they  had  arrived  when  the  fastest  pupil  had  finished  and  had  given 
the  "stop"  signal.  This  test  was  given  at  an  average  length  of  practice 
of  about  120  days;  the  variation  in  amount  of  practice  was  due  to  absence 
or  to  a  week  late  entrance  to  school;  the  latter  factor  being  of  small 
importance  as  the  early  students  "marked  time"  until  the  later  arrivals 
caught  up.  The  theory  of  any  such  measurement  is  that  relative  standing 
at  equal  amounts  of  practice  is  highly  correlated  with  rate  of  ability  to 
learn  shorthand,  and  perhaps  well  correlated  with  final  proficiency. 

Average,  ^=36.72;  standard  deviation,  0^=6.54. 

II.  Words  attempted  in  transcription  of  the  above  test  in  14  min.  o  sec. 
This  corresponds  to  speed  of  transcription.  This  was  also  competitive, 
the  time  of  14  min.  o  sec,  being  the  time  at  which  the  fastest  pupil  com- 
pleted all  of  the  performance  test. 

Mu  =  228.90;  o-jj  =  68.79. 

III.  Number  of  errors  in  transcription  of  the  above  test.  This  corresponds  to 
accuracy  of  transcription.  "Errors"  are  purely  transcriptional  errors 
and  not  typographical  errors  made  on  the  typewriter. 

^ni  =  I4.i5;  erm  =  11.63. 

IV.  Average  percentage  school  marks  on  six  periodical  reviews  in  the  textbook 
theory  of  shorthand.  This  is  the  measure  of  the  amount  of  theory 
retained  until  "examination  time."  (It  is  the  aim  of  the  instructors  to 
complete  the  textbook  theory  in  six  months.) 

^lV  =  88-35;  <TiV=5-92. 
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V.  Total  arithmetical  sum  of  errors  made  on  twenty  lessons  in  shorthand 
textbook  theory  upon  first  examination  in  each  respective  lesson.  These 
school  marks  were  recorded  in  the  teacher's  grade  book  for  each  lesson. 
If  a  "passing"  mark  is  not  obtained,  the  pupil  is  required  to  take  over  the 
examination  until  a  "passing"  mark  is  obtained;  the  errors  of  the  first  of 
such  examinations  was  taken,  rather  than  the  second  or  third  when  they 
occur,  as  better  expressing  the  pupil's  actual  rate  of  learning  and  interest 
in  stenography.    "Errors"  are  teacher's  count  of  what  constitutes  errors. 

Mv  =  121. oi ;  crv  =  55.48. 

VI.  Number  of  actual  days  of  practice  required  to  pass  the  second,  or  more 
difficult,  dictation  test  at  a  speed  of  75  dictated  words  per  minute  with  a 
given  minimum  of  1 1  words  per  minute  transcription  and  not  over  2  or  3 
transcription  errors  in  a  total  of  375  words.  This  is  an  individual 
examination  conducted  by  the  teacher.  A  dictation  test,  involving  a  less 
difficult  vocabulary,  precedes  the  more  difficult  test  here  taken  as  a 
measure  of  the  pupil's  ability  to  progress.  From  the  school  attendance 
books,  the  actual  days  of  attendance,  excluding  absences,  was  determined. 

Mvl  =  113.46;  (TVI  =  10.29. 

VII.  Average  of  six  monthly  "over-all"  school  marks,  kept  in  the  registration 
office  of  the  college  as  the  permanent  record  of  the  student.  This  is  the 
record  available  for  inspection  by  prospective  employers.  The  letter 
marks  were  arbitrarily  given  numerical  scores  as  follows:  F  —  =0;  F  =  1 ; 
F+=2;  G-  =3;  G=4;  G+=5;  E- =6;  E  =  8;  E+  =  io. 

^Vll  =  5-35;  0-Vll  =  I-56. 

These  seven  variables  were  combined  into  one  score  by  the 
following  procedure: 

a.  The  averages,  Afx's,  and  standard  deviations,  as,  of  all 
variables  were  computed. 

b.  Two  examiners,  who  had  worked  through  the  procedure  of 
obtaining  the  above  seven  variables  and  who  consequently  were 
presumably  fair  judges  of  the  comparative  reliability  and  im- 
portance of  the  different  variables  in  predicting  ability  to  progress 
in  stenography,  were  allowed  independently  to  distribute  20  units 
of  bids  to  the  seven  variables.  From  the  two  series,  given  below, 
a  compromise  weighting,  W,  was  then  decided  upon  after  con- 
sultation and  discussion  of  the  factors  of  reliability  or  unrelia- 
bility of  each  variable.  The  weights  given  by  the  two  judges  to 
the  seven  variables  are: 


I  II 

JudgeS    3  2 

Judge  T    1  2 


III     IV       V       VI       VII  Total 
1         3        6        4        1  20 
1         5         6        3         2  20 


Compromise   1         2         1         4        6        4        2  20 
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Criterion  Score 

Y  WlY  4-W"Y  -LW"lY  J'Vy  4.WVy  A 
Ac  A   A  JIT"  A  in  — T-  A  ivT"   Ayt 

(Ti  <TU  ^III  °iV  0"V 

EY!xvl+W™Xvn+K,  (A) 

0"VI  ^VII 

in  which  Wi,  Wu,  etc.,  are  respectively  the  compromise  weights 
above;  <ri,  o-n,  etc.,  are  respectively  the  standard  deviations  of 
Var.  I,  Var.  II,  etc. 

K  is  an  arbitrary  constant  to  be  taken  of  such  magnitude  that 
the  criterion  scores  will  be  readily  handled  quantities.1  It  is 
preferable  to  use  the  former  form  and  multiply  the  gross  meas- 

ures  by  — ,  etc.,  without  finding  the  deviations  and  thus  being 

compelled  to  work  with  negative  deviations. 

The  formula  used  is  then: 
Criterion  Score 

Xc  =  - — Xy\--— — Xu— — —  Xnl+— —Xiv—  -a^v— 

6.54        68.79         11.63  5.92  5548 

Xvl-\  r^vii*  (B) 


10.29  1.56 

The  minus  signs  of  the  above  are  to  be  particularly  noted.  The 
first  of  the  above  two  formulae  assumes  that  greater  numerical 
scores,  X's,  in  the  different  variables  always  indicate  greater  merit 
or  ability. 

Variables  III  and  V,  errors,  are  the  reverse  of  this,  greater 
numerical  scores  indicating  less  merit;  accordingly,  Wm  and  Wv 
become  minus  in  the  specific,  or  second  (B)  equation.  It  will 
also  be  noted  that  Variable  VI,  number  of  days'  practice  required 
before  being  able  to  pass  a  difficult  75-word-per-minute-dictation 
test,  should  receive  a  negative  sign  for  WVi,  since  more  days  of 
practice  to  come  up  to  a  given  proficiency  means  less  merit,  a 
slower  rate  of  learning.  The  distribution  of  the  twenty  bids  was 
made  in  the  abstract,  i.e.,  disregarding  signs  and  as  and  consider- 
ing the  relative  importance  of  the  variables  as  dependent  upon  : 


A  regression  value  of  K  will  also  result  from  the  simplification  of  the  form: 

_i(xI-jfI)+— (xn-Mn)+  .  .  .  -l±±(xvn-MVU). 

<ri  an  °VII 
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a.  The  amount  of  practice  of  which  it  is  a  measure. 

b.  The  carefulness  and  objectivity  of  the  grades  or  scores. 

c.  The  amount  of  missing  or  incomplete  measures.  (It  was 
found  necessary  to  supply  school  marks,  etc.,  for  a  given  person  if 
only  a  small  amount  were  missing.  This  is  better  practice  than 
to  discard  a  pupil  on  whom  data  are  not  complete;  in  fact,  it  is  the 
only  possible  procedure  if  one  would  have  a  reasonable  number  of 
subjects  left  on  which  to  base  his  tests.  It  requires  a  judicious 
statistician  to  handle  the  mass  of  statistics  ordinarily  found  in 
school  records.  In  place  of  missing  score  one  may  supply  either 
(a)  the  average  score  of  all  the  subjects,  or  (b)  a  score  estimated 
by  inspection  to  be  the  average  score  to  be  derived  from  two  or 
three  regression  equations  with  other  highly  related  variables.) 

When  simplified,  the  above  equation  yields  the  multipliers1  of 
the  gross  scores  of  the  respective  seven  variables,  which  products 
we  algebraically  summate  to  obtain  the  one  composite  criterion 
score.    This  equation  is: 
Criterion  Score 

Xc  =  .1 52X1  +  .029X11  -  .o86Xnl+  .676XIV  - .  io8Xv  - 

.389X^+1.282X^1.  (C) 

One  has  now  but  to  take  the  seven  criterion  gross  scores  of 
person  A  and  multiply  them  in  turn  by  the  seven  coefficients  of 
equation  C,  and  add  algebraically  to  get  A's  single  criterion  score. 
A  did:  I,  36  lines;  II,  attempted  229  words;  III,  made  14  errors  in 
transcribing;  IV,  made  average  of  94  per  cent  on  six  reviews;  V, 
had  61  errors  in  the  total  twenty  lesson  examinations;  VI,  took 
113  days  to  pass  the  second  dictation  test;  VII,  made  an  average 
monthly  grade  of  8  as  on  the  above  enumeration.  His  criterion 
score  is  accordingly: 

Xc  =  .152(36)  +  .029(229)  -  .086(14)  +  .676(94)  -  .108(61)  - 
•389(1 13) +  1-282(8)  =34.04. 

This  equation  C  solved  for  person  A.  The  criterion  score,  XCl 
of  A  is  34.04.    The  scores,  after  being  similarly  computed  for  all 

1  It  will  be  noted  that  the  magnitudes  of  these  new  multipliers  of  the  gross  scores  is 
anything  but  in  proportion  to  the  Wj,  Wu,  etc.,  weights  of  regression  importance. 
This  is  pointed  out  here  since  certain  test  makers  in  the  past  have  thought  that  "by- 
multiplying  age  by  5  gives  that  variable  a  weight  of  5."  In  reality  they  were 
multiplying  not  by  5  but  by  a  true  relative  importance  of  /3/<r  =5,  which  figure  is  to 
be  compared  by  its  ratio  to  a  similarly  derived  one  for  a  second  variable.  The  unit 
of  relative  importance  of  a  variable  in  forming  a  criterion  is  i<r. 
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subjects,  can  be  reduced  to  a  basis  of  small  numbers  by  means  of 
a  grouping  table. 

The  Typing  Criterion 

The  procedure  followed  here  was  in  general  the  same  as  for  the 
stenographic  criterion.  The  following  five  variables  were  ob- 
tained. 

I.  Days  of  actual  practice,  exclusive  of  absences,  required  to  complete  lessons 

I-  10  of  the  copy  textbook.  This  might  be  called  the  first  stage  of  typing 
progress. 

^1=31-551  0"!  =  15.46. 

II.  Days  of  actual  practice,  exclusive  of  absences,  required  to  complete  lessons 

II-  20  of  the  copy  textbook.  This  might  be  called  the  second  stage  of 
typing  progress. 

Jlfjj  =  24.81;  o-II  =  10.18. 

III.  Days  of  actual  practice,  exclusive  of  absences,  required  to  complete  lessons 
2 1-30  of  the  copy  textbook.  This  might  be  called  the  third  stage  of  typing 
progress.  Inasmuch  as  but  few  pupils  had  advanced  farther,  it  was 
impossible  to  test  the  pupils  at  their  final  limit  of  progress.  Variables 
I — 1 1 1  measure  rate  of  progress  satisfactorily. 

^111  =  31-90;  <rm  =  10.38. 

IV.  Average  monthly  "over  all"  school  marks  supplied  from  the  registration 
office  records,  reduced  to  numerical  values  by  the  arbitrary  scale:  P  +  =0; 
F-  =1;  F=2;  F+=3;  G-  =4;  G  =  5;  G+=6;  E-  =7;  E  =  8;  E+  =  io. 

MIV  =  6.o8;  (TIV=i.43. 

V.  The  average  of  two  independent  rankings  by  the  teacher,  a  week  apart,  of 
' '  poten  tial  ability ' '  in  typing.  This  was  taken  because  of  scarcity  of  other 
data  and  omissions  of  data  in  individual  cases.  The  slip  arrangement 
method  was  used;  correlation  of  the  rankings  being  p  =  .94 ±.01.  The 
average  rankings  were  changed  into  index  numbers  of  1  to  5,  5  per  cent  of 
the  group  being  given  the  index  1  (lowest) ;  20  per  cent  given  2,  50  per  cent 
given  3,  20  per  cent  given  4,  and  5  per  cent  given  5  (highest). 

Mv  =  2.g8;  ffy  =  .gi. 


These  five  variables  were  independently  judged  for  relative 
importance  disregarding  o-'s  and  signs,  by  two  examiners,  and  a 
compromise  weighting  of  each  determined  by  discussion. 


I 

II 

III 

IV 

V 

Total 

  5 

3 

2 

4 

6 

20 

Examiner  T   

  4 

5 

4 

3 

4 

20 

  4 

5 

3 

3 

5 

20 

10 
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The  equation  for  multiplying  gross  scores,  accordingly  becomes: 

A(  =  —  — — -Xi—  — — -Xii—    -  -Xih+— —  XIV+—  Xy+K. 
15.46        10.18         10.38         1.43  .91 

Xc=  -  .259XI-49iXII-.289XIIi+2.098XIV+5482Xv+40. 

As  will  be  seen  above,  the  larger  the  scores  of  the  first  three  vari- 
ables the  less  the  merit,  so  that  these  three  variables  receive  a 
minus  sign  in  the  above  equations.  The  constant  term,  40, 
insures  that  the  composite  scores  Xc  will  all  be  positive  quantities. 
These  were  then  reduced  to  small  numbers  by  means  of  a  grouping 
table. 

The  Bookkeeping  Criterion 
Six  variables  enter  into  the  bookkeeping  criterion : 

I.  Number  of  days  spent  in  completing  the  first  of  the  three  divisions  of  the 
bookkeeping  course. 

Ml  =38.26;  0^  =  20.65. 

II.  Percentage  school  mark  in  the  first  division  work,  as  recorded  in  the 
teacher's  class  grade  book. 

Mn  =  91.01;  <rn=4.55. 

III.  Number  of  days  spent  in  completing  the  second  of  the  three  divisions  of 
the  bookkeeping  course. 

^111  =  50.33;  Cm  =  1977. 

IV.  Percentage  school  mark  in  the  second  division  work,  as  recorded  in  the 
teacher's  class  grade  book. 

Mw  =  90.38;  o-iy  =  5.35. 

V.  Number  of  days  spent  in  completing  both  the  first  and  second  divisions  of 
the  bookkeeping  course.  This  cumulative  score  was  taken  because  of  the 
number  of  instances  in  which  it  was  necessary  to  estimate  the  days  upon 
which  a  given  student  completed  the  first  division  and  began  the  second, 
the  lessons  of  the  textbook  not  necessarily  being  taken  by  all  the  students 
in  the  chronological  order  of  the  text  book.  This  variable  is  free  from  that 
attenuating  factor.  This  represents,  on  the  average,  about  five  months 
practice  in  the  bookkeeping  course. 

il<fv  =  88.81;  (Tv  =  36.96. 

VI.  Average  monthly  "over  all"  school  marks  supplied  from  the  registration 
office  records,  reduced  to  numerical  values  by  the  arbitrary  scale:  F=o; 
F+  =  i;  G-=2;  G  =  4;  G+=5;  E-  =  7;  E  =  o;  E-f  =  io. 

Mvi=5-47;  o-Vi  =  I'95- 
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The  relative  importance  of  the  six  variables  estimated  by  the 
two  judges  are: 


I 

II 

III 

IV 

V 

VI 

Total 

Examiner  S  

  2 

5 

2 

5 

3 

3 

20 

  2 

3 

2 

4 

3 

6 

20 

  2 

4 

2 

5 

3 

4 

20 

The  equation  for  multiplying  gross  scores  accordingly  becomes : 


Xc  —  —  — —XlJr-^—Xu—  ^111+  — —  XjV— 

20.65        4.55         19.77  5-35 

—^—Xy+  — —  Xyi+K. 

3696  1.95 

Xc=-.097XI+.88oXII-.ioiXIII+.935XIV-.o8iXv+ 

2.050XVi  +  #. 

Variables  I,  III,  V  are  given  minus  signs  in  the  above  equations 
since  the  larger  numerical  scores  mean  the  less  merit.  A  grouping 
table  was  used  to  reduce  these  scores  to  small  numbers. 

The  General  Business  Criterion 

The  criterion  scores  in  each  of  the  three  criterion  variables  were 
changed  into  multiples  of  the  respective  c's  of  the  three  criteria. 
Use  was  made  of  Table  XXXIII. 

Some  individuals  had  two  or  even  three  criterion  scores.  Where 
such  was  the  case,  the  two  or  three  sigma  scores  were  averaged  for 
the  final  score  of  the  individual. 

It  will  be  seen  that  this  procedure  assumes  that  equal  sigma 
positions  in  the  three  criteria  mean  equal  merit  in  the  general 

business  ability  criterion.  Had  a 
more  refined  technique  been  used, 
such  as  could  be  used  if  it  were 
possible  to  obtain  accurate  meas- 
ures of  the  overlapping  of  each  of 
these  groups  upon  the  others,  mak- 
ing allowances  for  such  by  adding 
credit  to  two  of  the  criterion  scores 
according  to  the  interval  of  the 
two  larger  successively  above  the 
one  lowest,  then  higher  correla- 
■  tions  could  be  obtained  than  the 
final  multiple  r  we  have  obtained. 


All  32  Tests 

Av.  of  Typ.  total  raw 
scores  =  638. 5. 

Av.  of  Sten.  total  raw 
scores  =  623 . 1 . 

Av.  of  Bkkg.  total 
raw  scores  =604.0. 
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TABLE  XXXIII 
Criterion  Scores 
General  Criterion  Component  =  (Score  —  M)  — 


Typing 

Stenography 

Bookkeeping 

a  = 

3-48 

2.82 

3.83 

1 
a 

•28735632 

.  35460993 

.26109661 

M= 

6. si 

9  03 

9-71 

Deviations 

ct-Positions 

Crit. 

a 

b 

c 

aX 

bX 

cX 

Score 

Score 

Score 

Score 

Score 

•28735632 

.35460993 

.26109661 

-6.51 

—9 

03 

—9 

71 

Typing 

Stenog. 

Bkkg. 

Typing 

Stenog. 

Bkkg. 

Component 

Component 

Component 

1 

—551 

—8 

03 

—8 

71 

-1.58 

—2 

85 

—2 . 27 

1 

2 

—4-51 

—7 

03 

—7 

71 

—1.30 

—2 

49 

— 2.01 

2 

3 

—3-51 

—6 

03 

—6 

71 

— 1 .01 

— 2 

14 

—1.75 

3 

4 

—2.51 

—5 

03 

—5 

71 

—  .72 

— 1 

78 

—1-49 

4 

5 

—1. 51 

—4 

03 

—4 

71 

—  -43 

—1 

43 

—1.23 

5 

6 

—  -51 

—3 

03 

—3 

71 

—  .15 

—1 

07 

—  -97 

6 

7 

•  49 

— 2 

03 

— 2 

71 

.14 

72 

—  .71 

7 

8 

1-49 

—  I 

03 

— 1 

7i 

•43 

37 

—  -45 

8 

9 

2.49 

03 

71 

•  72 

01 

—  .19 

9 

10 

3-49 

97 

29 

1 .00 

34 

.08 

10 

II 

4-49 

I 

97 

I 

29 

1 .29 

70 

•  34 

11 

12 

5-49 

2 

97 

2 

29 

1.58 

1 

05 

.60 

12 

13 

6.49 

3 

97 

3 

29 

1.86 

1 

41 

.86 

13 

14 

7-49 

4 

97 

4 

29 

2.15 

1 

76 

1 . 12 

14 

15 

8.49 

5 

97 

5 

29 

2.44 

2 

12 

1.38 

IS 

16 

9-49 

6 

97 

6 

29 

2.73 

2 

47 

1 .64 

16 

17 

10.49 

7 

97 

7 

29 

3.01 

2 

83 

I.90 

17 

18 

11.49 

8 

97 

8 

29 

3.30 

3 

18 

2.16 

18 
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NOTES  ON  THEORY  AND  TECHNIQUE 
The  Multiple  Ratio  Correlation  Technique 

Test  scores  may  be  weighted  before  combining  them  into  a 
composite  scale  score  which  will  predict  the  criterion  better  than 
if  weighting  is  not  resorted  to.  The  maximum  correlation  of  the 
weighted  composite  with  the  criterion  is  secured  by  weighting 
each  test  proportional  to  its  partial  regression  coefficient.  The 
partial  regression  technique  is  so  laborious  that  its  use  is  known  to 
but  few.  The  returns,  in  increased  validity  of  a  scale  due  to 
weighting  the  tests  by  this  method,  have  frequently  seemed  small 
in  comparison  with  the  labor  involved  in  securing  them.  The  use 
of  the  method  has  been  criticised  by  many  on  the  grounds  that, 
with  the  number  of  cases  which  one  ordinarily  employs  in  an 
investigation,  the  partial  correlation  coefficients,  and  conse- 
quently the  partial  regression  coefficients,  have  high  P.E.'s. 
This  objection  is  unfounded,  however,  since  the  multiple  corre- 
lation coefficient  (or  correlation  coefficient  which  expresses  the 
validity  of  the  combined  or  weighted  scale)  always  has  a  smaller 
unreliability  than  any  one  of  the  individual  partial  correlation 
coefficients.  An  average  is  a  very  stable  central  value  obtained 
from  a  widely  varying  number  of  individual  components,  the 
gross  scores;  in  the  same  way,  the  multiple  correlation  coefficient 
is  a  rather  stable  value  derived  from  a  number  of  more  unreliable 
components. 

After  one  has  a  scale  of  a  few  tests,  the  addition  of  many  more 
tests  adds  ordinarily  but  little  to  the  efficiency  of  the  shorter  scale. 
There  has  been  no  way  in  the  past  whereby  one  can  pick  out  the 
most  efficient  set  of  tests  for  combining  into  a  scale.  After  having 
approximately  determined  the  partial  regression  weights  of  a 
number  of  tests,  Kelley  eliminates  successively  the  tests  of  lowest 
partial  regression  weights,  determining  after  each  successive 
elimination  the  correlation  of  the  remaining  composite,  as 
weighted,  with  the  criterion.    Kelley  has  devised  formulae  for 
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readily  determining  the  combination1  correlation  coefficient,  as 
each  test  is  successively  given  a  weight  of  zero,  which  thus 
eliminates  it.  This  procedure  necessitates  the  solution  of  all  the 
possible  intercorrelations  of  the  tests  and  the  determination  of 
either  the  true  or  the  approximate  regression  equation,  a  pro- 
cedure which  is  very  laborious  when  many  tests  are  involved.  A 
different  procedure  has  been  used  by  Rosenow,  who  assumes  that 
those  tests  which  have  the  highest  partial  correlation  coefficients 
of  the  nth.  order  are  the  ones  which  will  combine  to  best  advantage. 
Consequently,  in  his  work  with  tests  for  predicting  college  marks, 
he  determined  the  partial  correlation  coefficients  of  the  fourteenth 
order  and  then  combined  the  five  tests  having  the  highest  four- 
teenth order  partial  correlation  coefficients  to  make  a  scale  for 
predicting  college  marks.  This  procedure  is  probably  statisti- 
cally less  sound  than  that  used  by  Kelley,  and  the  procedure  is 
much  less  systematic. 

The  new  multiple  ratio  correlation  technique  enables  one  to 
determine  the  n  best  tests  to  combine  into  a  scale,  selected  from  a 
larger  number  of  tests,  n' .  Where  an  adequate  criterion  is  avail- 
able, the  method  thus  allows  one  with  minimum  effort  to  deter- 
mine the  n  "major  causes"  from  the  n  "causes"  investigated. 
When  these  n  tests  are  weighted  with  the  multiple  ratio  regression 
weights,  the  efficiency  of  the  weighted  scale  in  predicting  the 
criterion  is  given  by  the  multiple  ratio  correlation  coefficient. 

In  its  use  in  scale  making,  that  test  which  yields  the  highest 
correlation  with  the  criterion  is  taken  for  the  "backbone"  test  of 
the  scale.  Each  of  the  remaining  tests  is  then  investigated,  by 
means  of  the  appropriate  equation,  to  determine  which  test,  when 
added  to  the  tests  already  in  the  scale,  will  make  the  weighted 
composite  of  two  tests  correlate  highest  with  the  criterion.  To 
this  test  is  then  added  that  one  of  the  remaining  tests  which  will 
make  the  new  composite  of  three  tests  correlate  highest  with  the 
criterion,  and  so  on.  In  this  way  the  scale  is  built  up  rather  than 
torn  down,  as  in  Kelley 's  procedure.  With  only  twenty  or  so  tests 
to  choose  from,  the  multiple  ratio  correlation  coefficient  (close 

1  The  word  "combination  correlation  coefficient"  is  here  used  to  represent  the 
correlation  between  the  criterion  and  a  combination  of  tests  which  is  weighted  with 
a  series  of  weights  which  yields  other  than  the  true  multiple  correlation  coefficient. 
In  the  above  case,  combination  correlation  coefficients  are  rough  approximations  to 
the  multiple  correlation  coefficients,  at  least  at  the  beginning  of  the  series  of  suc- 
cessive eliminations  of  tests  with  the  lowest  partial  regression  coefficients. 
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approximation  to  the  true  multiple  correlation  coefficient)  after 
the  inclusion  of  a  few  tests  in  the  scale  approaches  the  magnitude 
which  it  would  have  with  all  the  tests  in  the  scale.  Once  the  tests 
are  arranged  in  order  of  decreasing  amount  of  contribution  in 
predicting  the  criterion,  and  the  appropriate  multiple  ratio  corre- 
lation coefficients  after  the  inclusion  of  each  successive  test  are 
known,  one  may  consider  the  scale  complete  just  as  soon  as  the 
increase  in  the  multiple  ratio  correlation  coefficient  seems  not  to 
justify  the  labor  involved  in  the  administration  and  scoring  of  an 
additional  test. 

The  technique,  the  discovery  of  which  was  motivated  by  the 
need  for  a  rapid,  systematic  method  of  weighting  tests,  has  grown 
up  by  a  series  of  successive  discoveries  of  formulae  extending 
through  the  year  of  the  investigation.  Consequently,  at  the 
time  when  its  use  was  first  undertaken  in  this  problem,  the  final 
elaboration  was  not  available.  It  was  therefore  impossible  to 
secure  the  greatest  value  of  the  technique  in  this  investigation. 
Its  use  has  been  much  simplified  by  the  preparation  of  printed 
form  charts  so  that  the  selection  of  the  five  best  tests  out  of,  say, 
fifteen,  is  but  the  work  of  a  few  hours  at  the  present  time.  With 
the  revised  technique  it  is  necessary  to  solve  but  a  few  of  the 
possible  intercorrelations  of  the  tests  if  one  is  selecting  the  n  best 
tests  out  of  a  larger  number. 

A  Note  on  Scoring  Formulae 

Where  true-false  tests  are  given,  the  common  practice  is  to  score 
the  test  by  the  scoring  formula:  score  equals  rights  minus  wrongs. 
The  procedure  is  justified,  even  on  the  part  of  those  who  profess  to 
disbelieve  in  the  practical  value  or  statistical  validity  of  using 
partial  regression  equations  for  determining  the  weights  of  in- 
dividual tests,  on  an  a  priori  assumption  that,  inasmuch  as  one 
may  presumably  get  half  of  the  questions  right  by  chance,  he 
should  be  discounted  an  amount  which  would  (1)  give  a  score  of 
100  per  cent  to  the  person  who  has  no  errors  and  (2)  give  a  score  of 
zero  to  a  person  who  gets  50  per  cent  of  the  questions  correct.  If 
the  question  were  one  of  mere  chance,  as  here  assumed,  the  scoring 
formula  above  would  adequately  take  care  of  the  situation.  But, 
the  situation  which  would  be  chance  in  a  dice  box  is  not  neces- 
sarily at  all  psychologically  a  chance  situation.    As  a  matter  of 
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fact,  if  it  be  assumed  that  a  person  is  totally  ignorant  of  the  proper 
answer  to  a  given  question,  he  is  frequently  predisposed  to  under- 
score the  one  answer  rather  than  the  other  by  some  psychological 
element  in  the  question.  Frequently  this  same  hypothetically 
ignorant  individual  could  be  made  to  underscore  the  opposite 
answer  to  the  one  which  he  did  first  by  the  simple  expedient  of 
changing  a  word  with  positive  intent  so  as  to  become  one  of 
negative  intent. 

Wherever  such  scoring  formulae  are  used,  the  implicit  assump- 
tion is  that  the  use  of  such  a  scoring  formula  will  yield  better 
correlations  with  the  criterion  to  be  predicted  than  if  the  scoring 
formula  were  not  used.  If  a  valid  criterion  is  available  it  is  a 
simple  matter  to  determine  the  correlation  with  the  criterion 
which  would  result  from  the  use  of  any  given  scoring  formula. 
Let  Rights  always  be  weighted  1.00  in  the  scoring  formula,  and 
let  errors  be  weighted  an  amount,  C,  which  may  be  either  a 
positive  or  negative  quantity  of  any  amount. 

Let  rIR  be  the  correlation  between  criterion  (/)  and  the  Rights 
(R). 

Let  rIW  be  the  correlation  between  the  criterion  and  the 
Wrongs  (W). 

Let  rRW  be  the  correlation  between  the  Rights  and  the  Wrongs. 
Then  the  formula  which  gives  the  correlation  of  the  criterion  with 
the  score,  S  =  R-\-C  •  W,  is  given  by  the  formula: 

rIR  '  GR~\~rIW  -  C  •  (Tw 

rIC>  /  =  • 

v  <?r2+C2  •  crw2+2rRW  •  (tr  ■  C  •  <jw 

It  should  not  be  forgotten  that  from  the  statistical  point  of  view, 
the  scoring  formula  S=R  is  just  as  truly  a  scoring  formula  which 
gives  Wrongs  a  perfectly  definite  weight  as  any  other  of  the  very 
many  scoring  formulae  which  might  be  adopted.  When  we  use 
this  formula  we  are  implying  either  that  (1)  the  Wrongs  have  a 
partial  regression  contribution  of  zero  or  that  (2)  the  contribution 
is  so  slight  as  not  to  justify  the  use  of  a  scoring  formula,  in  which 
case  C  may  be  said  to  be  zero  when  taken  to  the  nearest  integral 
value.  . 

Where  a  criterion  is  available,  other  weightings  for  errors  than 
a  discount  of  1 .00  each  are  the  rule  rather  than  the  exception  in 
true-false  tests.  In  the  same  way,  there  is  no  more  justification 
for  using  an  arbitrary  scoring  formula  on  recognition  tests  involv- 
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ing  three  or  more  alternatives  than  there  is  in  the  case  of  the 
true-false  tests. 

The  most  helpful  point  of  view  to  be  taken  in  regard  to  weight- 
ing tests  or  in  using  scoring  formulae  is  a  rather  simple  one.  This 
point  of  view  is  that  there  is  no  right  nor  wrong  way  to  weight 
tests,  elements  of  tests,  or  to  give  relative  weights  to  speed  and 
accuracy;  that  with  a  specific  purpose  in  mind,  we  should  use 
those  weights  and  those  scoring  formulae  which  will  give  us  the 
maximum  predictive  value  for  the  minimum  of  effort.  As  a 
matter  of  fact,  the  scoring  formula  S  =  R-\-C  -  W  is  but  a  very 
crude  approximation  to  the  intricate  scoring  formula,  involving 
various  exponential  functions  of  the  gross  measures,  which  we 
might  have.  At  the  present  stage  of  our  tests,  few  people  would 
think  of  devoting  their  time  to  the  calculation  of  exponential 
scoring  formulae.  Such  weighting  of  tests,  test  elements,  speed 
and  accuracy  may  possibly  mark  the  development  of  a  more 
complicated  mathematical  procedure  in  tests  once  we  have  criteria 
adequate  enough  to  justify  such  refined  technique.  For  that 
matter,  it  may  be  true  that  even  now  a  sliding  scale  system  of 
weighting  such  factors  would  yield  returns  of  sufficient  value  to 
justify  their  use.  It  is  true  that  partial  regression  equations 
based  on  assumptions  of  linearity  (that  is,  that  the  weight  of  a 
gross  score  X  is  a  constant  whatever  the  value  of  X)  are  the 
simpliest,  or  limiting,  case  of  more  complicated  curvilinear 
functions. 

The  use  of  any  scoring  formula,  the  simplest  of  which  is  S  =  R, 
makes  then  the  same  implicit  assumptions  (although  empirically 
not  ordinarily  so  very  bad  ones)  as  are  made  when  test  makers  add 
the  gross  scores  on  all  tests  often  with  the  naive  faith  that  thereby 
they  are  giving  "no  undue  weight  to  any  one  test"  believing  that 
they  are  thus  keeping  out  of  error  by  their  excessive  conservatism. 
It  frequently  happens  that  the  scoring  formula  S  =  R  gives  accept- 
able correlations  and  it  also  frequently  happens  that  weighting 
tests  by  adding  the  gross  scores  (which  practically  never  weights 
the  tests  equally)  also  yields  very  good  correlations  with  the 
criteria  at  hand.  There  is  no  scientific  merit  whatever  to  the 
statement,  "In  the  absence  of  an  adequate  criterion,  I  preferred 
not  to  weight  my  tests."  If  any  scores  are  given,  weights  are 
given.  The  method  of  giving  valid  arbitrary  importances  to 
variables  is  discussed  under  the  criteria  of  the  C-i  Clerical  Test. 
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The  Reliability  of  Selection  of  140-QuESTioN  General 
Trade  Test  from  a  List  of  204  Questions 

The  general  trade  test  of  204  questions  which  was  administered 
during  the  summer  school  of  1920  to  several  hundred  soldiers  was 
reduced  to  140  questions  largely  on  a  subjective  basis  of  the 
known  difficulties  in  scoring  and  the  likelihood  of  the  retained 
questions  being  an  adequate  sampling  of  the  different  trade 
groups  in  which  the  questions  were  classified  in  the  revised  140- 
question  set  of  the  army  general  trade  test.  The  scores  on  the 
140  questions  of  191  men  in  the  summer  school  who  had  taken  the 
longer  form  were  correlated  with  the  total  scores  on  the  longer 
form,  yielding  a  reliability  coefficient  of  .978=*=  .003.  It  is  a 
question,  of  course,  whether  such  high  reliability  would  be  secured 
upon  a  second  giving  of  the  140-question  set;  but  the  fact  that 
the  correlation  coefficient  is  of  the  magnitude  which  it  is,  demon- 
strates that  the  general  trade  test,  presumably  a  good  measure  of 
interest  in  mechanical  things,  has  a  very  high  reliability. 

The  Reliability  of  the  One-Word-Answer  Trade  Test 

After  the  summer  school  students  in  vocational  courses  at 
Camp  Grant,  111.,  in  1920,  had  completed  their  six  weeks  of 
summer  school  instruction,  they  were  given  final  examinations  in 
the  one-word-answer  form  covering  their  respective  courses.  The 
reliabilities  of  this  form  of  examination  are  shown  below,  com- 
puted by  correlating  the  scores  on  the  odd-numbered  questions 
with  the  scores  on  the  even-numbered  questions.  These  corre- 
lations are  uniformly  high.  The  column  headed  N  gives  the 
number  of  cases  on  which  the  reliabilities  of  the  various  exami- 
nations are  based;  that  headed  fn  gives  the  reliability  coefficient 
for  the  odd-numbered  with  the  even-numbered  scores;  that  headed 
r22  gives  the  reliability,  by  Brown's  formula,  of  Form  A  with 
Form  B  of  the  examination  of  the  present  number  of  questions,  ny 
in  each  case;  that  headed  r50(50)  is  the  reliability  by  Brown's 
formula  of  a  50-question  set  of  the  present  type  of  examination 
with  another  50-question  set,  that  is,  taking  enough  multiples  of 
the  present  r22  examination  in  order  that  there  would  be  50 
questions  in  the  set  in  place  of  the  number  which  is  now  given  in 
the  second  column  of  figures  of  the  table. 
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TABLE  XXXIV 

Reliability  of  One-Word  Answer  Trade  Test  Final  Examinations 
Given  to  the  Summer  School  Students,  Camp  Grant,  1920 


Examination 

Av. 
Score 

No.  OF 
Questions 
n 

N 

r22 

^50 (50) 

10. 0 

40 

34 

.699 

.823 

•854 

Welder  

16.4 

40 

8 

.720 

.837 

.865 

Motorcyclist  

10.6 

38 

10 

.729 

.843 

.876 

Plumber  

14. 1 

40 

16 

.761 

.864 

.888 

Draftsman  

27.1 

45 

14 

.780 

.876 

.887 

20.0 

49 

100 

.885 

•939 

.940 

179 

52 

30 

.898 

.946 

•944 

Storage  Battery  

6.8 

18 

21 

.801* 

.890 

•957 

Prognosis  Tests: 

6A  (Trades)  

51-5 

204 

102 

.886 

.940 

.792 

6B  (Scattered)  

58.2 

204 

102 

.919 

•958 

•847 

*  Spuriously  high,  too  many  zero  scores, 
fn  is  reliability  coefficient  of  half  of  the  test  against  the  other  half. 

2ru 


?22  ~ 


Ml 


too 
n 


^5C(50) 


1  + 


100 

n 


The  unusually  high  reliability  of  such  trade  test  forms  of 
questions  points  to  the  advisability  of  using  this  type  of  exami- 
nation wherever  it  is  possible  to  do  so.  It  may  readily  be  applied 
to  any  form  of  vocational  interest  test,  where  it  will  in  all  proba- 
bility measure  interest  better  than  the  true-false  type  of  exami- 
nation. One  may  more  surely  be  said  to  be  interested  in  a 
subject  if  he  can  recall  something  about  it  (that  is,  remember  it 
to  the  point  of  recall)  than  if  he  merely  can  recognize  it  upon  its 
presentation.  This  a  priori  argument  holds,  of  course,  only  for  a 
single  item  of  information.  The  respective  merits  of  the  one- 
word  answer  form  of  question  and  of  other  forms  of  test  reaction 
is  only  to  be  settled  in  terms  of  the  validity  of  the  respective  tests. 
It  is  not  even  to  be  settled  on  the  grounds  of  reliability.  If  the 
test  has  high  enough  validity,  we  do  not  mind  whether  it  is  as 
reliable  as  it  might  be  or  not.    A  test  which  is  unreliable  can 
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always  be  improved  by  lengthening  the  time,  lengthening  the 
number  of  elements,  improving  the  directions,  deleting  obscure 
phrases  and  questions,  or,  as  recommended  by  Herring,  making  a 
detailed  psychological  analysis  of  each  question  based  upon  the 
introspective  reactions  of  subjects  who  have  taken  the  test,  there- 
upon rewording  the  element  until  it  secures  the  desired  response 
from  those  of  superior  ability. 

The  Reliability  of  the  Unrevised  Mechanical  Interest 

Test 

The  original  edition  of  the  Mechanical  Interest  Test  was  de- 
vised by  Dr.  Edgar  Rice.  It  contained  forty-six  questions,  some 
of  which  had  as  many  as  four  tools  required  as  an  answer,  while 
eight  were  of  a  different  type,  one-word-answer  questions  in 
answer  to  the  general  question,  "What  is  the  name  of  tool  No. 
I4e."  The  reliability  by  the  odds-evens  method  was  computed, 
the  subjects  being  the  272  pupils  enrolled  in  the  Camp  Grant 
Summer  School,  which  included  those  studying  machine  shop, 
electrical,  automotive,  men  and  women  teachers,  and  business 
students.  The  reliability  coefficient  of  the  odd-even  scores  is  .80. 
By  the  application  of  Brown's  formula,  Form  A  correlates  with 
Form  B  of  the  same  test  of  46  questions  to  the  extent  of  .89,  which, 
reduced  to  our  common  basis  of  50  questions  on  Form  A  with 
Form  B,  yields  the  high  reliability  of  r50(50)  =  -894.  This  relia- 
bility was  subsequently  increased  to  .923  by  care  used  in  revision 
of  the  test  by  eliminating  or  changing  the  wording  of  those 
questions  whose  meaning  was  obscure,  by  eliminating  entirely 
the  eight  questions  of  the  totally  different  type  which  required  a 
different  mental  set,  and  by  the  adoption  of  the  scoring  formula 
which  yields  highest  reliability.  Such  revision  always  results  in 
increased  reliability  of  the  test.  It  also  tends  to  increase  the 
validity  of  the  test,  although  not  in  proportionate  amount.  One 
may  usually  select  n  questions  from  a  large  number,  nf,  which  will 
statistically  yield  a  much  higher  validity.  It  is  not  known 
whether  this  statistically  attained  validity  would  persist  upon  re- 
examination by  means  of  a  test  consisting  of  the  selected  questions 
only,  although  there  is  no  reason  for  believing  that  the  increase 
would  disappear  entirely  or  even  to  half  its  extent.  The  same 
assumptions  are  made  regarding  separate  tests  taken  from  a 
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longer  scale  and  given  differential  weights.  Weighting  of  the 
answers  of  individual  questions  may  be  resorted  to.  Thus  far, 
test  makers  have  investigated  this  possibility  but  little  further 
than  to  eliminate  those  questions  which  have  a  low  or  negative 
correlation  with  the  criterion. 

The  Scoring  Method  of  the  Revised  Mechanical  Interest 

Test 

This  investigation  is  interesting  from  the  point  of  view  of  the 
technique  involved.  The  Mechanical  Interest  Test  asks  at  the 
beginning  "  What  tools  are  used  [followed  by  individual  numbered 
situations  such  as,]  to  saw  up  railroad  ties  into  firewood  lengths?" 
In  some  cases  the  answer  will  be  the  number,  taken  from  the 
pictures,  of  a  single  tool,  and  in  some  cases  the  numbers  of  two 
tools.  There  are  thus  three  possible  ways  in  which  scores  might 
be  computed. 

A.  Count  one  error  for  each  number  not  correctly  given  in  the 
brackets  as  indicated  below,  one  credit  accordingly  being  given 
for  each  number  which  is  right.  This  would  mean  that  questions 
requiring  two  answers  would  receive  two  points  credit  and  ques- 
tions having  only  one  number  as  answer  would  receive  only  one 
point  credit  if  entirely  correct;  and  one  error  would  be  counted  for 
each  number  incorrect. 

B.  If  either  or  both  answers  to  a  double  answer  question  are 
incorrect,  score  it  zero,  and  score  it  1  only  when  both  are  correct, 
giving  one  credit  to  the  correct  one-answer  question  if  correct. 

C.  Give  one  credit  each  to  each  of  a  double-parenthesis  answer 
as  in  (A),  or  a  total  of  two  credits  to  a  double-parenthesis  answer 
and  likewise  two  credits  to  a  correct  single-parenthesis  answer. 
This  amounts  to  giving  each  question  two  credits  and  using  partial 
credits  in  the  case  of  double-answer  questions. 

This  technique  assumes,  in  the  absence  of  a  better  criterion, 
that  the  best  scoring  method  is  the  one  which  has  the  highest 
reliability. 

The  failures  by  sentences  and  by  test  subject  were  plotted, 
enabling  quick  computation  of  scores  by  the  three  scoring  meth- 
ods. A  stencil  form  greatly  hastened  the  speed  of  the  original 
entry.  For  future  work  of  this  kind,  prepared  cross-section  paper 
will  be  found  to  be  very  helpful. 

The  following  reliabilities  result: 
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Scoring  Methods  of  Mechanical  Interest  Test.  (E.R.T.-5.) 
iV=223  cases 


For  Errors 

Take  Off 

Reliability 

Reliability  of 

as  Indicated 

Points 

of  Halves 

Whole  Test 

_  

A.  Scoring  

(1)  (1) 

2 

rn  =  .776 

.876 

(1) 

1 

Either  wrong 

B.  Scoring  

()  () 

1 

(  ) 

1 

rn  =  .808 

.896 

C.  Partial  Credit 

Scoring  

(1)  (1) 

(2) 

rn =  . 849 

.912 

The  results,  taking  as  the  best  scoring  method  the  one  with 
highest  reliability,  show  that  a  partial  credit  scoring  method  of 
two  points  for  each  question  correct  and  a  partial  half  credit  of 
one  point  for  one  of  two  parentheses  correctly  filled  out  is  best. 

The  adopted  method  is  then  the  partial  credit  method,  where 
rnn,  Form  A  with  Form  B,  is  .912.  Reduced  to  a  50-question 
basis,  7^50  =  .923.  This  figure  is  to  be  compared  with  .894  of 
the  unrevised  45-question  set  of  this  test. 

The  following  two  tables  show  the  high  reliabilities,  by  the 
odds-evens  method,  obtained  for  two  tests  extensively  used  in  the 
E  and  R  Schools  in  measuring  mechanical  interest  and  amateur 
knowledge  of  mechanical  things. 


TABLE  XXXV 
Mechanical  Interest  Test  Reliabilities 


Edition 

No.  OF 
Questions 

No.  OF 
Cases 

rn 

Form  A 

AND  B 

^22 

Form  A50 

with 
Form  B60 

rnnso 

Rice  Form  (Unrevised)  . 

46 

271 

.796 

.886 

.894 

1920-192 1  Revised  Edi- 

45 

223 

.849 

.920 

.928 

1921-1922  Second  Edi- 

26 

198 

.717 

.836 

.906 
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TABLE  XXXVI 
General  Trade  Test  Reliabilities 


Edition 

No.  OF 

No.  OF 

Questions 

Cases 

^22 

Tnribo 

6A  (arranged  by*  trade 

groups)  

204 

102 

.886 

.940 

.792 

6B  (scattered)  f  

204 

102 

.919 

•958 

.847 

Revised  140-Set  (1920- 

Correlation  with  6B  240-ques- 

1921)  

140 

191 

tion  form=  .978 

Revised  40-Q  nest  ion  Set 

(1921-1922)  (Scaled) . 

40 

196 

.792 

.884 

•905 

Revised  50-Question  Set 

(1921-1922)  

50 

*  All  blacksmith  questions  arranged  together,  all  carpenter,  etc. 

t  Questions  from  various  trades  arranged  in  random  order.  Since  the  reliability 
of  the  random  order  is  highest,  it  might  seem  that  a  random  order  is  most  desirable 
in  interest  questions;  this  value  is  somewhat  offset  by  the  desirability  of  knowing  the 
field  of  the  test  subject's  greatest  interest. 


The  Effect  upon  Validity  of  Doubling  the  Length  of  a  Test 

Let  rji  be  the  correlation  of  the  criterion  and  the  first  giving  of 
the  test. 

Let  rI2  be  the  correlation  of  the  criterion  and  the  second  giving 
of  the  test. 

Let  r12  be  the  reliability  coefficient  of  the  test. 
Now,  rji  is  approximately  equal  to  rI2,  or  it  may  be  assumed  as 
equal.    And  the  sigmas  are  equal  in  two  such  alternative  forms 
of  test. 

The  gross  score  weight  of  Test  2  with  respect  to  Test  1  when  its 
gross  scores  are  given  a  weight  of  1.00  is  given  by  the  formula 

^2=-&/l, 

in  which  f}2/i=ri2~rn'ri2. 

rn—r12  •  r12 

Now  since  rn=rI2}  /32/i  =  i.oo;  and  since  (t2  =  <ti,  it  follows  that 
W2=  1. 00. 

This  means  that  if  we  add  the  gross  scores  of  Form  2  to  the 
gross  scores  of  Form  1,  we  shall  have  a  test  just  twice  as  long  as 
formerly  and  shall  be  weighting  the  two  forms  equally  by  the 
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partial  regression  equation.  If  the  variabilities  are  not  equal,  we 
may  take  PTi  =  i.oo,  and  W2  =  — l,  in  which  W\  and  W2  are  gross 

score  weights.  If  the  two  forms  are  not  of  identical  difficulty,  the 
different  averages  in  themselves  do  not  affect  the  formulae  below; 
however,  tests  with  markedly  different  average  scores  are  likely 
not  to  fulfill  the  necessary  assumption  above  that  rn  =  rI2. 

In  any  case,  whether  <n  =  o-2  or  not,  if  the  partial  regression 
equation  be  used  for  weighting  Form  2  with  respect  to  Form  1, 
the  correlation  with  the  criterion  of  the  combined  Forms  1  and 
2,  each  being  weighted  with  true  partial  regression  importances 
of  1. 00,  is  given  by  the  following  formula: 


4 


'/<»-*>  «v  ^ — ■ 


Substituting  rn=rI2  in  expressions  containing  rI2,  and  factoring 
the  denominator, 

V  (1_r12)(1+ri2) 


=  V^.rnJ  


(l-r«>(l+rii) 

^(i+2)  =      •  fa- 


+r12 

The  correlation  with  the  criterion  of  two  multiples  of  a  given  test 
(when  weighted  by  the  regression  equation)  is, 


2)  =  V2  "  r7i 


It  is  probably  better  to  take  rn 


+^12 
^71  +  02 


When  the  reliability  of  the  test  is  1.00,  r/(i+2)  =  ^n»  or  there  is 
no  value  in  giving  a  second  test  if  the  scores  on  the  second  test 
are  identical  with  the  first;  when  ri2  is  o,  r/d+2)  =  V2  •  rn  \  when 
r12  is  negative  r7(1+2)  is  even  larger  than  V2  •  rji.  It  is  of  course 
highly  improbable  that  one  could  construct  a  test  in  which  both 
ri\  =  rn  an<3  fi2  =  o,  or  ri2<o;  it  is  of  course  possible  when  rn  is 
almost  zero,  but  in  that  case  the  test  is  too  invalid  to  be  of  prac- 
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tical  use  anyway.  In  the  above  formula,  ri2  may  be  taken  as  the 
correlation  of  odds-evens,  rn  the  correlation  of  odds  with  crite- 
rion, rI2  the  correlation  of  evens  with  the  criterion.  Successive 
applications  of  the  above  formula  may  then  be  made,  in  each  case 

2^12 

calculating  the  value  of  rw  by  the  formula,  r/i2  =  7~  ,  in  which 

r'12  is  a  reliability  coefficient  of  the  double  of  the  previous  test, 
which  previous  test  has  a  reliability,  ri2.  In  this  way,  one  may 
determine  the  correlation  of  2,  4,  8,  16,  32,  64,  etc.,  multiples  of 
the  present  halves,  or  wholes,  of  the  present  test,  and  plot  a  curve 
of  multiples  and  time,  both  used  as  abscissae,  showing  the  in- 
creasing validity  to  be  secured  by  longer  tests.  A  resort  to  this 
formula  should  decide  whether,  for  the  results  secured,  it  is  better 
to  lengthen  the  test  or  to  try  out  different  test  content. 


A  Plot  for  Solving  the  Formula,  77(1+2)  =  V2  •  rn.  ■%  — - — 

^1  +  7-12 

Since  this  equation  may  be  written 


the  family  of  curves  for  representative  values  of  ri2  may  be  drawn 
as  straight  line  curves  in  which  abscissae  are  rn ;  each  line  of  slope, 


m  =  xj   is  named  for  the  rn  from  which  it  is  derived;  and 

ordinates  are  the  sought  values  of  ?7(i+2)  or  correlation  of  two 
multiples  of  the  present  test  with  the  criterion. 


The  Determination  of  the  Limits  within  Which  Variables 
may  be  Correlated 

It  is  frequently  desirable  to  know  the  limits  within  which  vari- 
ables may  be  correlated  with  each  other  as  dependent  upon 
existing  relationships.  As  an  example,  consider  the  problem:  If 
tests  of  "mechanical  aptitude"  correlate  to  an  extent  of  .60,  say, 
with  a  certain  mechanical  criterion,  and  these  same  tests  correlate 
with  general  intelligence  to  the  extent  of,  say,  .50;  then  what 
are  the  limits  within  which  we  must  locate  the  correlation  of 
the  mechanical  criterion  and  intelligence;  in  other  words,  how 
11 
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little  and  how  much  can  intelligence  be  related  to  mechanical 
ability? 

The  following  diagram  is  very  helpful  in  visualizing  the  relation- 
ships involved  in  the  formulae: 


In  the  above  case,  /  is  the  criterion,  A  the  mechanical  test,  and 

B  the  intelligence  test. 

By  the  conditions,  rIA  =  .6o; 

^  =  .50; 

and  it  is  required  to  know  the  limits  within  which  rIB  may  vary. 
These  are  found  by  certain  manipulations  of  the  formula  for 
multiple  correlation.  The  multiple  correlation  coefficient,  used 
in  determining  the  efficiency  of  a  partial  regression  weighted 
scale  of  A  and  B  in  predicting  the  criterion,  is  given  by  the 
formula, 


Obviously  with  the  values  of  any  three  of  the  variables  of  the 
above  equation  given,  we  may  solve  for  the  value  of  the  fourth ; 
it  may  also  be  advantageous  in  certain  cases,  with  two  of  the 
values  of  the  variables  given,  to  plot  the  curve  showing  the 
dependence  of  the  third  upon  the  fourth,  whereupon  for  any 
seemingly  plausible  value  of  the  one  we  may  secure  from  the  curve 
the  value  of  the  fourth.  This  latter  procedure  would  be  of  value 
in  such  a  problem  as  "Given  rIA  =  .6o,  what  is  the  maximum 
value  of  rIB  as  dependent  upon  the  correlation,  rAB,  whose  exact 
value  we  do  not  know?" 

The  limits  within  which  rIB  may  vary  depend  for  their  solution 
upon  two  theorems: 

A.  The  maximum  value  of  rIB  is  found  when  A  and  B  together 
predict  the  criterion  perfectly,  i.e.,  when  rIC>  —  i.oo. 
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B.  The  minimum  value  of  rIB  is  found  when  B  predicts  no 
element  of  the  criterion  not  already  predicted  by  A;  that  is,  when 
rIA  is  not  increased  in  size  by  the  combination  with  it  of  B;  or 
finally,  when  rIC  =  rIA. 

By  use  of  the  above  two  formulae  in  our  problem,  we  see  that 
the  maximum  value  of  rIB  is  found  by  solving  for  rIB  the  equa- 
tion: 


whence,  ^  =  ,993. 

It  is  rather  disconcerting  to  know  that,  with  the  relationships 
given  (which  are  in  about  the  magnitudes  found  in  some  of  our 
paper  mechanical  tests),  B  must  correlate  to  the  extent  of  .993 
with  the  criterion  in  order  that  A  and  B  combined  will  predict  the 
criterion  perfectly. 

The  minimum  value  of  rIB  is  found  by  use  of  the  second  theo- 
rem, by  solving  for  rIB  in  the  equation, 


whence,  ^  =  .30. 

A  special  case  of  interest  in  solving  for  rIB  as  dependent  for  its 
value  upon  rAB>  is  the  case  where  rAB  =0.  If,  as  above,  rIA  =  .60; 
then  when  rIC>  =  1.00  and  when  rAB  =  o,  the  value  of  rIA  indicates 
the  "correlation  with  the  criterion  of  all  other  factors  totally 
unrelated  to  A,  the  mechanical  test." 


It  is  rather  startling  to  discover  that,  with  a  test  which  corre- 
lates .6  with  a  criterion,  "all  other  totally  unmeasured  factors" 
correlate  with  this  same  criterion  to  the  extent  of  .8.  If  the 
factors  which  we  might  add  to  our  present  test,  A,  to  perfectly 
predict  the  criterion,  do  correlate  positively  to  any  extent  with  A, 
then  these  other  factors  will  correlate  even  more  than  .8  with  the 
criterion. 


Thus, 


whence,  in  this  special  case, 


=  ^l- rIA>  =  V7T(  >6o)2  =  .80. 
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Another  case  of  interest  is  to  assume  rIA=rIB  and  rAB—oy 
when  rIC  =  i.oo.    Solving  for  rIA  in  the  formula, 


Or,  two  tests  totally  unrelated  to  each  other  (o  correlation  between 
them)  may  yet  each  correlate  to  the  same  extent,  a  maximum 
correlation  with  a  criterion  of  .707.  In  popular  language,  a 
"genius"  on  the  one  test  is  as  likely  as  not  to  be  rated  "idiot"  on 
the  other,  and  yet  both  tests  correlate  with  the  criterion  to  the 
extent  of  .707. 

We  have  just  proved  that  Test  V  (specific  vocational  aptitude 
test)  and  Test  S  (specific  school  aptitude  test)  may  correlate  zero 
with  each  other  and  may  yet  correlate  equally  to  the  maximum 
extent  of  .707  with  a  valid  vocational  criterion.  If  now,  as  a 
practical  school  procedure,  we  were  to  consider  for  a  vocational 
course  all  those  who  are  below  average  in  Test  S,  and  who  pre- 
sumably are  either  doing  failing  or  else  poor  work  in  school,  we 
would  reduce  considerably  the  range  of  ability  left  in  the  academic 
work  for  future  progress.  Dr.  Ruger1  has  shown  that  the 
standard  deviation  of  the  reduced  group  is  given  by  the  formula, 


when  the  standard  deviation  of  the  entire  distribution  is  con- 
sidered  to  be  1.00.    That  is,  the  ratio  —=.60  in  Dr.  Kelley's 

Li 

formula, 


Solving,  r  =  .514. 

In  other  words,  the  elimination  from  the  academic  school  of  all 
people  below  the  average  in  Test  S  reduces  the  maximum  possible 
correlation  of  Test  S  ("intelligence")  with  the  vocational  criterion 

1  We  desire  to  express  our  thanks  to  Dr.  Ruger  for  the  derivation  of  this  formula. 


rIA*+rIA*-2(rIA)(rIA)(o) 
riA=rIB  =  V.  50  =  707. 


r 
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to  .514  1  in  the  group  of  "survivors."  At  the  same  time,  those 
eliminated  and  now  taking  up  vocational  work  will  still  correlate 
.707  with  this  same  vocational  criterion;  this  correlation  is  not 
affected  by  the  division  of  the  group  made  on  the  basis  of  a  test 
with  which  Test  V  correlates  zero. 

1  However,  those  going  on  for  more  academic  work  are  of  homogenous  and 
superior  academic  ability. 
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92ff. 

Ruger,  H.  A.,  152. 
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Sackett,  L.  W.,  63. 

Scale,  "building  up,"  139. 

School  marks,  transmutation  of,  65. 

School  records,  value  of,  6. 

School  work,  self  correlation,  12. 

Scoring  directions:  Arith.-Re.,  1 1 1 ; 
C-l,  U4ff.;  C-2,  117;  I.E.R.  As- 
sembly, I22ff.;  Stenquist  Assem- 
bly, 119. 

Scoring  formulae,  76,  139-141;  possi- 
ble exponential,  141 ;  possible  sliding 
scale  weights,  141. 

Scoring  method:  determination  of 
best,  I45ff.;  of  mechanical  interest 
test,  I45ff. 

Shop  ranks,  prediction  of,  250°. 

Silk  mill  operators,  tests  of,  10. 

Sliding  scale  weights,  141. 

Standard  error  of  estimate,  128. 

Stenographers,  tests  of,  106. 

Stenographic,  criterion,  129-133;  dic- 
tation test,  129-130;  school  marks, 
129-130;  transcription  test,  129. 

Stenography,  correlation  with  suc- 
cess in  learning,  74-75;  efficiency 
of  tests  in  predicting,  80,  85ff. 

Stenquist  Assembly  Test,  results  in 
Company  E,  no;  correlations  with 
Army  Alpha,  33;  with  clerical 
ability,  102;  with  I.E.R.  Assem- 
bly test,  22;  with  mechanical  tests, 
I7ff.;  directions  for  administering, 
118,  119;  directions  for  scoring, 
119;  distribution  of  scores,  no; 
group  differences  in,  109,  no;  rela- 
tionship to  paper  mechanical  test, 
25ff.;  reliability  of,  46;  results  in 
Company  O,  I02ff.;  results  in 
Company  W,  iooff.;  results  of 
administration  to  ungraded  girls, 
45,  46;  scoring  while  administering 
paper  tests,  119,  120. 

Stenquist  Picture  Test,  correlation 
with  Army  Alpha,  33;  correlation 
with  mechanical  tests,  I7ff.;  results 
in  Company  O,  I02ff.;  weighting 
of,  121. 

Students'  estimates  of  students  in 
mechanical  ability,  37,  38. 

Summary  of  intercorrelations,  boys 
and  girls,  22  insert. 

Teachers,  estimates  of  students' 
mechanical  ability,  38,  39;  intelli- 
gence tests  of,  9. 

Tests,  never  unweighted,  141 ;  inter- 
changeability  of,  4. 

Tests  of,  accountants,  io6ff.;  auto- 
motive students,  36,  37;  auto 
repairmen,   10;  bookkeeping  stu- 


dents, 36,  37,  63ff.,  67,  74-75.  80, 
85ft. ;  business  college  students, 
63fT.,  836°.; clerks, 9off.,  iooff.,  I02ff., 
io6ff.;  engineers,  8;  executives,  102; 
foreign  training  class,  102;  grapho- 
type  operators,  7;  hostlers,  10; 
machinists,  10;  machinist  students, 
36,  37;  policemen,  11;  prevoca- 
tional  students,  24;  stenography 
students,  74-75,  80,  85,  106; 
teachers,  9;  typing  students,  74-75, 
85ff.,  106;  ungraded  girls,  45,  46; 
university  students,  22-24,  IQ6. 
Thorndike-McCall  Reading  Test,  II, 

Thurstone   Manual  Training  Test. 

126;  correlations  with  mechanical 

tests.  I7ff. 
Trade  Test  Division,  1. 
Trade  Test,  final  examinations.  143. 
Trade  tests,  relation  to  intelligence, 

10. 

Transmutation  of  gross  scores,  76, 
77- 

Truck  drivers,  tests  of,  10. 

Typing:  correlations  with  success  in 
learning,  74-75;  criterion,  133,  134; 
efficiency  of  tests  in  predicting,  80, 
85ff.;  school  marks,  133;  textbook 
practice  used  as  a  test,  133. 

Typists,  tests  of,  106. 

Ungraded  girls,  tested  with  I.E.R. 
Assembly  test  and  Stenquist  As- 
sembly test,  45,  46. 

Unit  tests,  description  of,  67ff.;  im- 
provement upon  current  test  tech- 
nique, 7off. 

University  students:  comparison  with 
soldiers,  23;  correlation  of  Sten- 
quist Assembly  and  intelligence, 
24;  mechanical  ability  of,  22;  re- 
sults of  C-i,  106. 

Validity,  increase  upon  doubling 
test,  1476°. 

Validity,  increased,  formula  for,  148. 

Variability  of  persons  above  the 
average,  152,  153. 

Vocational  guidance:  function  of 
elementary  school,  4;  problem  of, 
iff.;  role  of  probability  in,  127; 
simultaneous  scoring  and  admin- 
istration of  tests,  119,  120. 

Vocational  guidance  tests,  materials 
needed,  128. 

Vocational  proficiency,  hypotheses  of 
relationship  of  test  scores,  2. 

Vocational  testing,  a  principle  of,  3. 
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Vocational  tests:  desirable  time  to 
administer,  2-3;  desirability  of 
zero  correlation  with  intelligence, 
152,  153;  trial  selection  of,  I. 

Weighting  of:  Arith-Re.  test,  12,  13; 
bookkeeping  criterion  variables,  64; 
C-i  clerical  test,  89,  92;  C-2  clerical 
test,  97;  individual  questions,  145; 
Stenquist  Picture  test  121. 

Weighting  criterion  factors  nega- 
tively, 131,  134,  135. 


Weighting  factors,  in  Stenographic 

criteria,  130;  Typing  criteria,  133; 

Bookkeeping  criteria,  135. 
Weighting:     integral     gross  score 

weights,  88. 
Weighting  tests,  desirability  of,  4. 
Weights,   exponential,    141;  sliding 

scale,  141;  to  predict  differential 

success,  81,  87. 

Zero  correlation  in  vocational  tests, 
when  desirable,  22ff.,  152,  153. 
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